On Wed, Sep 24, 2014 at 4:03 PM, Arpit Agarwal <aagar...@hortonworks.com> wrote: >> I would appreciate in the future if benchmarks were posted >> a day or two before a merge vote of a performance improvement was >> suggested. >> I feel that it would have been better to float the idea of a >> merge on the JIRA before actually calling it, > > Colin, I noted my intention to start the merge vote on the Jira last week. > In response you asked for verification that read performance will not > regressed. Read results were posted prior to starting the vote. I agree > that the write numbers were not posted until one day after the merge vote > started. If we delay the vote expiration by a day will it help address any > remaining timing concerns?
Thank you for the offer. I am OK with the current vote expiration time, now that we've seen more benchmarks and discussed more potential issues. I am -0 on the merge vote. But I do want to note that I consider HDFS-7141 to be a blocker for merging to branch-2.6. >> There are two areas where I think we need more clarification. The >> first is whether other eviction strategies can be implemented besides >> LRU. > > If 2Q or another scheme requires more hooks we can certainly add to the > pluggable interface. It is not a public interface that is set in stone. It > is similar to BlockPlacementPolicy and VolumeChoosingPolicy. Both are > HDFS-internal interfaces with alternative implementations that we rev > frequently. I implemented a modified 2Q today on HDFS-7142... I'd appreciate a review. Although I haven't had time to do a lot of testing on this, I think this removes this as a blocker in my mind. >> The other area is how the memory management for HDFS-6581 fits in with >> the memory management for HDFS-4949. I am very concerned that this >> has been handwaved away as future work. If we create a configuration >> mess for system administrators as a result, I will be sad. > > You asked about memory management on 8/21. I responded the same day stating > that we are not introducing any configuration settings since mounting the > disk is done by an administrator. Your response did not indicate any > concern with this approach. You can see the comment history on HDFS-6581. So > I am surprised you raise it as a blocker one month later. We have not > introduced new limits as part of this change so there is no concern of > future config incompatibilities. The Cache Executor and Lazy Writer can be > updated to be aware of the memory usage of each other in a compatible way. > > What we are voting on here is merging to trunk. If you have additional and > reasonable concerns that you would like to see addressed prior to 2.6 we > can have a separate discussion about 2.6 blockers. I think it's fine to address HDFS-7141 after the merge to trunk. But as I noted above, I think we absolutely need to address it before the merge to 2.6. We are starting to see a lot of users of HDFS-4949, and I want to make sure that there is a reasonable story for using both features at the same time. Let's continue this discussion on HDFS-6919 and HDFS-6988 and see if we can come up with a solution that works for everyone. best, Colin > > Regards, > Arpit > > > On Wed, Sep 24, 2014 at 2:19 PM, Colin McCabe <cmcc...@alumni.cmu.edu> > wrote: > >> On Wed, Sep 24, 2014 at 11:12 AM, Suresh Srinivas >> <sur...@hortonworks.com> wrote: >> > Features are done in a separate branch for two reasons: 1) During a >> feature >> > development the branch may be not functional 2) The high level approach >> and >> > design is not very clear and development can continue while that is being >> > sorted out. >> >> Up until the last two days, we had no benchmarks at all, so we didn't >> have any way to evaluate whether this performance improvement was >> functional. Andrew commented on this as well in this thread, and we >> also raised the issue on the JIRA. I am glad that a few benchmarks >> have finally been posted. I would appreciate in the future if >> benchmarks were posted a day or two before a merge vote of a >> performance improvement was suggested. As it is, it feels like we are >> racing the clock right now to figure out how well this works, and it >> puts the reviewers in an unpleasant position. >> >> >In case of this feature, clearly (2) is not an issue. We have >> > had enough discussion about the approach. I also think this branch is >> ready >> > for merge without rendering trunk not functional. >> >> I agree that this can be merged without rendering trunk >> non-functional. I don't agree that we have achieved consensus on all >> the high-level approach and design. >> >> There are two areas where I think we need more clarification. The >> first is whether other eviction strategies can be implemented besides >> LRU. Yes, there is a pluggable interface, but I am concerned that it >> may not be flexible enough to implement anything besides LRU. I am >> working on a patch now that implements 2Q. I will find out for >> myself, I guess. >> >> The other area is how the memory management for HDFS-6581 fits in with >> the memory management for HDFS-4949. I am very concerned that this >> has been handwaved away as future work. If we create a configuration >> mess for system administrators as a result, I will be sad. This is >> the kind of design that I would have liked to see done a while ago. I >> would certainly veto any merge to 2.6 until we have figured this out. >> >> I commented about this a month ago: >> >> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025 >> But my concerns were not addressed in the branch. >> >> If these two design clarifications can be made in time for the merge, >> I will be -0 on this. I have opened HDFS-7141 and HDFS-7142 to >> continue the discussion on these points. >> >> > That leaves us with one >> > objection that is being raised; the content is not complete. This did not >> > prevent recently merged crypto file system work from being merged to >> trunk. >> > We discussed and decided that it can be merged to trunk and we will >> finish >> > the remaining work that is a must have in trunk before merging to >> branch-2. >> > That is a reasonable approach for this feature as well. In fact there >> are a >> > lot of bug fixes and improvements still happening on crypto work even >> after >> > branch-2 merge (which is a good thing in my opinion, it shows that the >> > testing is continuing and feature is still being improved). >> >> I am OK with merging this before it is complete. I was just pointing >> out that this could have benefited from more time in development. It >> would have been a much smoother merge and less stressful for everyone. >> >> > >> > As regards to the eviction policy we have had discussion in the jira. I >> > disagree that there exists a "usable" or "better" strategy that works >> well >> > for all the work loads. I also disagree that additional policy to the >> > existing LRU is needed, leave alone it being needed for trunk merge. New >> > eviction policies need to be developed as people experiment with it. >> Hence >> > making the eviction policy pluggable is great way to approach it. >> >> LRU is a poor policy for scan workloads. For example, see here: >> http://www.cse.iitd.ernet.in/~sbansal/os/lec/l30.html >> "In general, LRU works well, but there are certain workloads where LRU >> performs very poorly. One well-known workload is a "scan" wherein a >> large memory region (larger than physical memory) is accessed in quick >> succession. e.g... Such scans are often quite common, as some thread >> may want to go through all its data. And so, plain LRU (or CLOCK) does >> not work as well in practice. Instead a variant of LRU that considers >> both recency and frequency of access to a page, is typically used in >> an OS (e.g., 2Q, CLOCK with Adaptive Replacement, etc.)." >> >> So is your argument that we are not using this for scanning workloads? >> That seems tough to believe, and as I said earlier, you have posted >> no system-level benchmarks, so how can I evaluate that claim? >> >> > There are some testing information posted on the jira, such as >> performance. >> > But I think we need more details on how this feature is being tested. >> > Arpit, can you please post test details. As for me, given how much work >> > (both discussion and coding) has gone into it, and given it is a very >> > important feature that allows experimentation to use memory tier, I would >> > like to see this available in release 2.6. >> >> Just to reiterate, I am -0 provided we can address HDFS-7141 and >> HDFS-7142 before this gets set in stone. I would hate to -1 this, >> because it would mean that you could not call another vote for a week. >> But I feel that it would have been better to float the idea of a merge >> on the JIRA before actually calling it, to avoid having discussions >> like this where we are racing the clock. >> >> thanks, >> Colin >> >> > >> > On Tue, Sep 23, 2014 at 6:09 PM, Colin McCabe <cmcc...@alumni.cmu.edu> >> > wrote: >> > >> >> This seems like a really aggressive timeframe for a merge. We still >> >> haven't implemented: >> >> >> >> * Checksum skipping on read and write from lazy persisted replicas. >> >> * Allowing mmaped reads from the lazy persisted data. >> >> * Any eviction strategy other than LRU. >> >> * Integration with cache pool limits (how do HDFS-4949 and lazy >> >> persist replicas share memory)? >> >> * Eviction from RAM disk via truncation (HDFS-6918) >> >> * Metrics >> >> * System testing to find out how useful this is, and what the best >> >> eviction strategy is. >> >> >> >> I see why we might want to defer checksum skipping, metrics, allowing >> >> mmap, eviction via truncation, and so forth until later. But I feel >> >> like we need to figure out how this will integrate with the memory >> >> used by HDFS-4949 before we merge. I also would like to see another >> >> eviction strategy other than LRU, which is a very poor eviction >> >> strategy for scanning workloads. I mentioned this a few times on the >> >> JIRA. >> >> >> >> I'd also like to get some idea of how much testing this has received >> >> in a multi-node cluster. What makes us confident that this is the >> >> right time to merge, rather than in a week or two? >> >> >> >> best, >> >> Colin >> >> >> >> >> >> On Tue, Sep 23, 2014 at 4:55 PM, Arpit Agarwal < >> aagar...@hortonworks.com> >> >> wrote: >> >> > I have posted write benchmark results to the Jira. >> >> > >> >> > On Tue, Sep 23, 2014 at 3:41 PM, Arpit Agarwal < >> aagar...@hortonworks.com >> >> > >> >> > wrote: >> >> > >> >> >> Hi Andrew, I said "it is not going to be a substantial fraction of >> >> memory >> >> >> bandwidth". That is certainly not the same as saying it won't be >> good or >> >> >> there won't be any improvement. >> >> >> >> >> >> Any time you have transfers over RPC or the network stack you will >> not >> >> get >> >> >> close to the memory bandwidth even for intra-host transfers. >> >> >> >> >> >> I'll add some micro-benchmark results to the Jira shortly. >> >> >> >> >> >> Thanks, >> >> >> Arpit >> >> >> >> >> >> On Tue, Sep 23, 2014 at 2:33 PM, Andrew Wang < >> andrew.w...@cloudera.com> >> >> >> wrote: >> >> >> >> >> >>> Hi Arpit, >> >> >>> >> >> >>> Here is the comment. It was certainly not my intention to misquote >> >> anyone. >> >> >>> >> >> >>> >> >> >>> >> >> >> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14138223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138223 >> >> >>> >> >> >>> Quote: >> >> >>> >> >> >>> It would be nice to see that would could get a substantial fraction >> of >> >> >>> memory bandwidth when writing to a single replica in-memory. >> >> >>> >> >> >>> The comparison will be interesting but I can tell you without >> >> measurement >> >> >>> it is not going to be a substantial fraction of memory bandwidth. We >> >> are >> >> >>> still going through DataTransferProtocol with all the copies and >> >> overhead >> >> >>> that involves. >> >> >>> >> >> >>> When the goal is in-memory writes and we are unable to achieve a >> >> >>> substantial fraction of memory bandwidth, to me that is "not good >> >> >>> performance." >> >> >>> >> >> >>> I also looked through the subtasks, and AFAICT the only one related >> to >> >> >>> improving this is deferring checksum computation. The benchmarking >> we >> >> did >> >> >>> on HDFS-4949 showed that this only really helps when you're down to >> >> single >> >> >>> copy or zero copies with SCR/ZCR. DTP reads didn't see much of an >> >> >>> improvement, so I'd guess the same would be true for DTP writes. >> >> >>> >> >> >>> I think my above three questions are still open, as well as my >> question >> >> >>> about why we're merging now, as opposed to when the performance of >> the >> >> >>> branch is proven out. >> >> >>> >> >> >>> Thanks, >> >> >>> Andrew >> >> >>> >> >> >>> On Tue, Sep 23, 2014 at 2:10 PM, Arpit Agarwal < >> >> aagar...@hortonworks.com> >> >> >>> wrote: >> >> >>> >> >> >>> > Andrew, don't misquote me. Can you link the comment where I said >> >> >>> > performance wasn't going to be good? >> >> >>> > >> >> >>> > I will add some add some preliminary write results to the Jira >> later >> >> >>> today. >> >> >>> > >> >> >>> > > What's the plan to improve write performance? >> >> >>> > I described this in response to your and Colin's comments on the >> >> Jira. >> >> >>> > >> >> >>> > For the benefit of folks not following the Jira, the immediate >> task >> >> we'd >> >> >>> > like to get done post-merge is moving checksum computation off the >> >> write >> >> >>> > path. Also see open subtasks of HDFS-6581 for other planned perf >> >> >>> > improvements. >> >> >>> > >> >> >>> > Thanks, >> >> >>> > Arpit >> >> >>> > >> >> >>> > >> >> >>> > On Tue, Sep 23, 2014 at 1:07 PM, Andrew Wang < >> >> andrew.w...@cloudera.com> >> >> >>> > wrote: >> >> >>> > >> >> >>> > > Hi Arpit, >> >> >>> > > >> >> >>> > > On HDFS-6581, I asked for write benchmarks on Sep 19th, and you >> >> >>> responded >> >> >>> > > that the performance wasn't going to be good. However, I thought >> >> the >> >> >>> > > primary goal of this JIRA was to improve write performance, and >> >> write >> >> >>> > > performance is listed as the first feature requirement in the >> >> design >> >> >>> doc. >> >> >>> > > >> >> >>> > > So, this leads me to a few questions, which I also asked last >> week >> >> on >> >> >>> the >> >> >>> > > JIRA (I believe still unanswered): >> >> >>> > > >> >> >>> > > - What's the plan to improve write performance? >> >> >>> > > - What kind of performance can we expect after the plan is >> >> completed? >> >> >>> > > - Can this expected performance be validated with a prototype? >> >> >>> > > >> >> >>> > > Even with these questions answered, I don't understand the need >> to >> >> >>> merge >> >> >>> > > this before the write optimization work is completed. Write >> perf is >> >> >>> > listed >> >> >>> > > as a feature requirement, so the branch can reasonably be called >> >> not >> >> >>> > > feature complete until it's shown to be faster. >> >> >>> > > >> >> >>> > > Thanks, >> >> >>> > > Andrew >> >> >>> > > >> >> >>> > > On Tue, Sep 23, 2014 at 11:47 AM, Jitendra Pandey < >> >> >>> > > jiten...@hortonworks.com> >> >> >>> > > wrote: >> >> >>> > > >> >> >>> > > > +1. I have reviewed most of the code in the branch, and I >> think >> >> its >> >> >>> > ready >> >> >>> > > > to be merged to trunk. >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > On Mon, Sep 22, 2014 at 5:24 PM, Arpit Agarwal < >> >> >>> > aagar...@hortonworks.com >> >> >>> > > > >> >> >>> > > > wrote: >> >> >>> > > > >> >> >>> > > > > HDFS Devs, >> >> >>> > > > > >> >> >>> > > > > We propose merging the HDFS-6581 development branch to >> trunk. >> >> >>> > > > > >> >> >>> > > > > The work adds support to write to HDFS blocks in memory. The >> >> >>> target >> >> >>> > use >> >> >>> > > > > case covers applications writing relatively small, >> intermediate >> >> >>> data >> >> >>> > > sets >> >> >>> > > > > with low latency. We introduce a new CreateFlag for the >> >> existing >> >> >>> > > > CreateFile >> >> >>> > > > > API. HDFS will subsequently attempt to place replicas of >> file >> >> >>> blocks >> >> >>> > in >> >> >>> > > > > local memory with disk writes occurring off the hot path. >> The >> >> >>> current >> >> >>> > > > > design is a simplification of original ideas from Sanjay >> Radia >> >> on >> >> >>> > > > > HDFS-5851. >> >> >>> > > > > >> >> >>> > > > > Key goals of the feature were minimal API changes to reduce >> >> >>> > application >> >> >>> > > > > burden and best effort data durability. The feature is >> optional >> >> >>> and >> >> >>> > > > > requires appropriate DN configuration from administrators. >> >> >>> > > > > >> >> >>> > > > > Design doc: >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> https://issues.apache.org/jira/secure/attachment/12661926/HDFSWriteableReplicasInMemory.pdf >> >> >>> > > > > >> >> >>> > > > > Test plan: >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> https://issues.apache.org/jira/secure/attachment/12669452/Test-Plan-for-HDFS-6581-Memory-Storage.pdf >> >> >>> > > > > >> >> >>> > > > > There are 28 resolved sub-tasks under HDFS-6581, 3 open >> tasks >> >> for >> >> >>> > > > > tests+Jenkins issues and 7 open subtasks tracking planned >> >> >>> > > improvements. >> >> >>> > > > > The latest merge patch is 3300 lines of changed code of >> which >> >> 1300 >> >> >>> > > lines >> >> >>> > > > is >> >> >>> > > > > new and updated tests. Merging the branch to trunk will >> allow >> >> HDFS >> >> >>> > > > > applications to start evaluating the feature. We will >> continue >> >> >>> work >> >> >>> > on >> >> >>> > > > > documentation, performance tuning and metrics in parallel >> with >> >> the >> >> >>> > vote >> >> >>> > > > and >> >> >>> > > > > post-merge. >> >> >>> > > > > >> >> >>> > > > > Contributors to design and code include Xiaoyu Yao, Sanjay >> >> Radia, >> >> >>> > > > Jitendra >> >> >>> > > > > Pandey, Tassapol Athiapinya, Gopal V, Bikas Saha, Vikram >> Dixit, >> >> >>> > Suresh >> >> >>> > > > > Srinivas and Chris Nauroth. >> >> >>> > > > > >> >> >>> > > > > Thanks to Haohui Mai, Colin Patrick McCabe, Andrew Wang, >> Todd >> >> >>> Lipcon, >> >> >>> > > > Eric >> >> >>> > > > > Baldeschwieler and Vinayakumar B for providing useful >> feedback >> >> on >> >> >>> > > > > HDFS-6581, HDFS-5851 and sub-tasks. >> >> >>> > > > > >> >> >>> > > > > The vote runs for the usual 7 days and will expire at 12am >> PDT >> >> on >> >> >>> Sep >> >> >>> > > 30. >> >> >>> > > > > Here is my +1 for the merge. >> >> >>> > > > > >> >> >>> > > > > Regards, >> >> >>> > > > > Arpit >> >> >>> > > > > >> >> >>> > > > > -- >> >> >>> > > > > CONFIDENTIALITY NOTICE >> >> >>> > > > > NOTICE: This message is intended for the use of the >> individual >> >> or >> >> >>> > > entity >> >> >>> > > > to >> >> >>> > > > > which it is addressed and may contain information that is >> >> >>> > confidential, >> >> >>> > > > > privileged and exempt from disclosure under applicable law. >> If >> >> the >> >> >>> > > reader >> >> >>> > > > > of this message is not the intended recipient, you are >> hereby >> >> >>> > notified >> >> >>> > > > that >> >> >>> > > > > any printing, copying, dissemination, distribution, >> disclosure >> >> or >> >> >>> > > > > forwarding of this communication is strictly prohibited. If >> you >> >> >>> have >> >> >>> > > > > received this communication in error, please contact the >> sender >> >> >>> > > > immediately >> >> >>> > > > > and delete it from your system. Thank You. >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > -- >> >> >>> > > > <http://hortonworks.com/download/> >> >> >>> > > > >> >> >>> > > > -- >> >> >>> > > > CONFIDENTIALITY NOTICE >> >> >>> > > > NOTICE: This message is intended for the use of the >> individual or >> >> >>> > entity >> >> >>> > > to >> >> >>> > > > which it is addressed and may contain information that is >> >> >>> confidential, >> >> >>> > > > privileged and exempt from disclosure under applicable law. If >> >> the >> >> >>> > reader >> >> >>> > > > of this message is not the intended recipient, you are hereby >> >> >>> notified >> >> >>> > > that >> >> >>> > > > any printing, copying, dissemination, distribution, >> disclosure or >> >> >>> > > > forwarding of this communication is strictly prohibited. If >> you >> >> have >> >> >>> > > > received this communication in error, please contact the >> sender >> >> >>> > > immediately >> >> >>> > > > and delete it from your system. Thank You. >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> >>> > -- >> >> >>> > CONFIDENTIALITY NOTICE >> >> >>> > NOTICE: This message is intended for the use of the individual or >> >> >>> entity to >> >> >>> > which it is addressed and may contain information that is >> >> confidential, >> >> >>> > privileged and exempt from disclosure under applicable law. If the >> >> >>> reader >> >> >>> > of this message is not the intended recipient, you are hereby >> >> notified >> >> >>> that >> >> >>> > any printing, copying, dissemination, distribution, disclosure or >> >> >>> > forwarding of this communication is strictly prohibited. If you >> have >> >> >>> > received this communication in error, please contact the sender >> >> >>> immediately >> >> >>> > and delete it from your system. Thank You. >> >> >>> > >> >> >>> >> >> >> >> >> >> >> >> > >> >> > -- >> >> > CONFIDENTIALITY NOTICE >> >> > NOTICE: This message is intended for the use of the individual or >> entity >> >> to >> >> > which it is addressed and may contain information that is >> confidential, >> >> > privileged and exempt from disclosure under applicable law. If the >> reader >> >> > of this message is not the intended recipient, you are hereby notified >> >> that >> >> > any printing, copying, dissemination, distribution, disclosure or >> >> > forwarding of this communication is strictly prohibited. If you have >> >> > received this communication in error, please contact the sender >> >> immediately >> >> > and delete it from your system. Thank You. >> >> >> > >> > >> > >> > -- >> > http://hortonworks.com/download/ >> > >> > -- >> > CONFIDENTIALITY NOTICE >> > NOTICE: This message is intended for the use of the individual or entity >> to >> > which it is addressed and may contain information that is confidential, >> > privileged and exempt from disclosure under applicable law. If the reader >> > of this message is not the intended recipient, you are hereby notified >> that >> > any printing, copying, dissemination, distribution, disclosure or >> > forwarding of this communication is strictly prohibited. If you have >> > received this communication in error, please contact the sender >> immediately >> > and delete it from your system. Thank You. >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.