Re: Why rely on url scheme for fetching?

Ankur Chauhan Tue, 04 Nov 2014 02:47:49 -0800

Hi ben,

Thanks for the follow up. I think thats a perfectly fine game plan. I think I 
can break down things into smaller, more isolated chunks. But from the looks of 
MESOS-1316 the invocation code and the testing code seems to change a lot (in a 
good way), so to avoid a bunch of wasted cycles, I was going to hold off a 
little till it is committed and then go on to refactoring the transport parts. 
The json config change (MESOS-1248) is completely disconnected to this so I am 
not too worried about that one.
The only place where some coordination may be required is the 
launcher/fetcher.cpp. It seems there are common changes happening between my  
change https://reviews.apache.org/r/27483/ 
<https://reviews.apache.org/r/27483/> and your 
https://reviews.apache.org/r/21316 <https://reviews.apache.org/r/21316> which 
would mean a little rework but it's totally within reasonable limits.
I think if you get the MESOS-1316 committed and I get MESOS-1711 ( 
https://reviews.apache.org/r/27483 <https://reviews.apache.org/r/27483> ) in, 
both of us would get unblocked in a way that we can get a way that lets us 
concurrently work on MESOS-336 and refactoring fetcher into more testable parts.


Does this make sense to you or did I misunderstand some part?

-- Ankur

> On 4 Nov 2014, at 02:05, Bernd Mathiske <[email protected]> wrote:
> 
> Typo: not -> note
> 
>> On Nov 4, 2014, at 10:59 AM, Bernd Mathiske <[email protected]> wrote:
>> 
>> Hi Ankur,
>> 
>> I like it, too. However, I cannot refrain from relaying (not my choice of 
>> word here) to you the advice to break your relatively large patch down into 
>> smaller parts. My patch for MESOS-336 certainly was, as I know now. My plan 
>> is to get MESOS-1316 (which is now under review) and MESOS-1248 (which 
>> should be quick) committed and then rebase MESOS-336 in smaller pieces. If 
>> you have small pieces, too, I do think we can do this somewhat concurrently, 
>> as your refactoring seems to affect mostly how the transport part works. I 
>> am on the other hand mostly interested in the bookkeeping of what is in 
>> progress, what succeeded, what failed, what has been put where, and so on, 
>> which can be separated, if we are careful about it.
>> 
>> (BTW, I think we need both unit tests and integration tests. And we don’t 
>> have nearly enough of either.)
>> 
>> Bernd
>> 
>>> On Nov 3, 2014, at 6:18 PM, Adam Bordelon <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> + Bernd, who has done some fetcher work, including additional testing, for 
>>> MESOS-1316, MESOS-1945, and MESOS-336
>>> 
>>> On Mon, Nov 3, 2014 at 9:04 AM, Dominic Hamon <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi Ankur
>>> 
>>> I think this is a great approach. It makes the code much simpler, 
>>> extensible, and more testable. Anyone that's heard me rant knows I am a big 
>>> fan of unit tests over integration tests, so this shouldn't surprise anyone 
>>> :)
>>> 
>>> If you haven't already, please read the documentation on contributing to 
>>> Mesos and the style guide to ensure all the naming is as expected, then you 
>>> can push the patch to reviewboard to get it reviewed and committed.
>>> 
>>> On Mon, Nov 3, 2014 at 12:49 AM, Ankur Chauhan <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi,
>>> 
>>> I did some learning today! This is pretty much a very rough draft of the 
>>> tests/refactor of mesos-fetcher that I have come up with. Again, If there 
>>> are some obvious mistakes, please let me know. (this is my first pass after 
>>> all).
>>> https://github.com/ankurcha/mesos/compare/prefer_2 
>>> <https://github.com/ankurcha/mesos/compare/prefer_2>
>>> 
>>> My main intention is to break the logic of the fetcher info some very 
>>> discrete components that i can write tests against. I am still re-learning 
>>> cpp/mesos code styles etc so I may be a little slow to catch up but I would 
>>> really appreciate any comments and/or suggestions.
>>> 
>>> -- Ankur
>>> @ankurcha
>>> 
>>>> On 2 Nov 2014, at 18:17, Ankur Chauhan <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I noticed that the current set of tests in `src/tests/fetcher_tests.cpp` 
>>>> is pretty coarse grained and are more on the lines of a functional test. I 
>>>> was going to add some tests but it seems like if I am to do that I would 
>>>> need to add a test dependency on hadoop. 
>>>> 
>>>> As an alternative, I propose adding a good set of unit tests around the 
>>>> methods used by `src/launcher/fetcher.cpp` and `src/hdfs/hdfs.cpp`. This 
>>>> should be able to catch a good portion of cases at the same time keeping 
>>>> the dependencies and runtime of tests low. What do you guys thing about 
>>>> this?
>>>> 
>>>> PS: I am pretty green in terms of gtest and the overall c++ testing 
>>>> methodology. Can someone give me pointers to good examples of tests in the 
>>>> codebase.
>>>> 
>>>> -- Ankur
>>>> 
>>>>> On 1 Nov 2014, at 22:54, Adam Bordelon <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Thank you Ankur. At first glance, it looks great. We'll do a more 
>>>>> thorough review of it very soon.
>>>>> I know Tim St. Clair had some ideas for fixing MESOS-1711 
>>>>> <https://issues.apache.org/jira/browse/MESOS-1711>; he may want to review 
>>>>> too.
>>>>> 
>>>>> On Sat, Nov 1, 2014 at 8:49 PM, Ankur Chauhan <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hi Tim,
>>>>> 
>>>>> I just created a review https://reviews.apache.org/r/27483/ 
>>>>> <https://reviews.apache.org/r/27483/> It's my first stab at it and I will 
>>>>> try to add more tests as I figure out how to do the hadoop mocking and 
>>>>> stuff. Have a look and let me know what you think about it so far.
>>>>> 
>>>>> -- Ankur
>>>>> 
>>>>>> On 1 Nov 2014, at 20:05, Ankur Chauhan <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Yea, i saw that the minute i pressed send. I'll start the review board 
>>>>>> so that people can have a look at the change.
>>>>>> 
>>>>>> -- Ankur
>>>>>> 
>>>>>>> On 1 Nov 2014, at 20:01, Tim Chen <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> Hi Ankur,
>>>>>>> 
>>>>>>> There is a fetcher_tests.cpp in src/tests.
>>>>>>> 
>>>>>>> Tim
>>>>>>> 
>>>>>>> On Sat, Nov 1, 2014 at 7:27 PM, Ankur Chauhan <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> Hi Tim,
>>>>>>> 
>>>>>>> I am trying to find/write some test cases. I couldn't find a 
>>>>>>> fetcher_tests.{cpp|hpp} so once I have something, I'll post on review 
>>>>>>> board. I am new to gmock/gtest so bear with me while i get up to speed.
>>>>>>> 
>>>>>>> -- Ankur
>>>>>>> 
>>>>>>>> On 1 Nov 2014, at 19:23, Timothy Chen <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> Hi Ankur,
>>>>>>>> 
>>>>>>>> Can you post on reviewboard? We can discuss more about the code there.
>>>>>>>> 
>>>>>>>> Tim
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>> On Nov 1, 2014, at 6:29 PM, Ankur Chauhan <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Tim,
>>>>>>>>> 
>>>>>>>>> I don't think there is an issue which is directly in line with what i 
>>>>>>>>> wanted but the closest one that I could find in JIRA is 
>>>>>>>>> https://issues.apache.org/jira/browse/MESOS-1711 
>>>>>>>>> <https://issues.apache.org/jira/browse/MESOS-1711>
>>>>>>>>> 
>>>>>>>>> I have a branch ( 
>>>>>>>>> https://github.com/ankurcha/mesos/compare/prefer_hadoop_fetcher 
>>>>>>>>> <https://github.com/ankurcha/mesos/compare/prefer_hadoop_fetcher> ) 
>>>>>>>>> that has a change that would enable users to specify whatever hdfs 
>>>>>>>>> compatible uris to the mesos-fetcher but maybe you can weight in on 
>>>>>>>>> it. Do you think this is the right track? if so, i would like to pick 
>>>>>>>>> this issue and submit a patch for review.
>>>>>>>>> 
>>>>>>>>> -- Ankur
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 1 Nov 2014, at 04:32, Tom Arnfeld <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Completely +1 to this. There are now quite a lot of hadoop 
>>>>>>>>>> compatible filesystem wrappers out in the wild and this would 
>>>>>>>>>> certainly be very useful.
>>>>>>>>>> 
>>>>>>>>>> I'm happy to contribute a patch. Here's a few related issues that 
>>>>>>>>>> might be of interest;
>>>>>>>>>> 
>>>>>>>>>> - https://issues.apache.org/jira/browse/MESOS-1887 
>>>>>>>>>> <https://issues.apache.org/jira/browse/MESOS-1887>
>>>>>>>>>> - https://issues.apache.org/jira/browse/MESOS-1316 
>>>>>>>>>> <https://issues.apache.org/jira/browse/MESOS-1316>
>>>>>>>>>> - https://issues.apache.org/jira/browse/MESOS-336 
>>>>>>>>>> <https://issues.apache.org/jira/browse/MESOS-336>
>>>>>>>>>> - https://issues.apache.org/jira/browse/MESOS-1248 
>>>>>>>>>> <https://issues.apache.org/jira/browse/MESOS-1248>
>>>>>>>>>> 
>>>>>>>>>> On 31 October 2014 22:39, Tim Chen <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> I believe there is already a JIRA ticket for this, if you search for 
>>>>>>>>>> fetcher in Mesos JIRA I think you can find it.
>>>>>>>>>> 
>>>>>>>>>> Tim
>>>>>>>>>> 
>>>>>>>>>> On Fri, Oct 31, 2014 at 3:27 PM, Ankur Chauhan <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I have been looking at some of the stuff around the fetcher and saw 
>>>>>>>>>> something interesting. The code for fetcher::fetch method is 
>>>>>>>>>> dependent on a hard coded list of url schemes. No doubt that this 
>>>>>>>>>> works but is very restrictive.
>>>>>>>>>> Hadoop/HDFS in general is pretty flexible when it comes to being 
>>>>>>>>>> able to fetch stuff from urls and has the ability to fetch a large 
>>>>>>>>>> number of types of urls and can be extended by adding configuration 
>>>>>>>>>> into the conf/hdfs-site.xml and core-site.xml
>>>>>>>>>> 
>>>>>>>>>> What I am proposing is that we refactor the fetcher.cpp to prefer to 
>>>>>>>>>> use the hdfs (using hdfs/hdfs.hpp) to do all the fetching if 
>>>>>>>>>> HADOOP_HOME is set and $HADOOP_HOME/bin/hadoop is available. This 
>>>>>>>>>> logic already exists and we can just use it. The fallback logic for 
>>>>>>>>>> using net::download or local file copy is may be left in place for 
>>>>>>>>>> installations that do not have hadoop configured. This means that if 
>>>>>>>>>> hadoop is present we can directly fetch urls such as tachyon://... 
>>>>>>>>>> snackfs:// ... cfs:// .... ftp://... s3://... http:// ... file:// 
>>>>>>>>>> with no extra effort. This makes up for a much better experience 
>>>>>>>>>> when it comes to debugging and extensibility.
>>>>>>>>>> 
>>>>>>>>>> What do others think about this?
>>>>>>>>>> 
>>>>>>>>>> - Ankur
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Dominic Hamon | @mrdo | Twitter
>>> There are no bad ideas; only good ideas that go horribly wrong.
>>> 
>> 
>

Re: Why rely on url scheme for fetching?

Reply via email to