> On Nov. 4, 2014, 9:46 p.m., Vinod Kone wrote: > > src/launcher/fetcher.cpp, line 223 > > <https://reviews.apache.org/r/27483/diff/7/?file=748389#file748389line223> > > > > Why fall through if result.isError()? > > Ankur Chauhan wrote: > The way I am thinking of the logic of fetchers is on the lines of "we try > all fetchers one by one till we succeed *even* if an unsuccessful fetcher can > handle it". This means that if for any reason (maybe mesos-fetcher is running > doesn't have permissions or whatever be the case) a fetcher fails we always > go to the next one. I understand that a case can be made to do something on > the lines of > > ``` > if(local.canHandle(uri)) { > return local.fetch(uri); > } else if (os_net.canHandle(uri)) { > return os_net.fetch(uri); > } else if ( hadoop.canHandle(uri)) { > return hadoop.fetch(uri); > } else { > return Error("Can't download uri:" + uri); > } > ``` > > But this way we are assuming that local fetcher has all possible > permissions to handle local/file uris *and* os_net has all the configuration > to handle all http(s) | ftp/ftp(s) uris and takes away the fallback. It may > be the case that the hadoop fetcher (and hence hadoop client) has some > credentials in its config files that aren't present in the urls themselves. > An example of this is if ftp uris are provided without user or password but > core-site.xml has the required user/password in it. > > Does this answer your concern? > > Vinod Kone wrote: > I'm concerned about how the logs will look for an user when fetching > URIs. If it's a non-local URI they would always see a message and failure > with local copy before using HDFS or Net, which is bad. > > I would much rather make it explicit what you are doing, with comments. > > ``` > if (local.canHandle(uri) { > result = local.fetch(uri); > if (result.isSome()) { > return result; > } else { > LOG(WARNING) << "Failed to copy local file: " << result.Error() << "; > } > } else if (os_net.canHandle(uri)) { > result = os_net.fetch(uri); > if (result.isSome()) { > return result; > } else { > LOG(WARNING) << "Failed to download URI: " << result.Error() << "; > } > } > > // If we are here, one of the following is possible. > // 1. Failed to copy a local URI. > // 2. Failed to download a remote URI using Os.net. > // 3. HDFS compatible URI. > // 4. Unexpected URI. > // For all these cases we try to fetch the URI using Hadoop client as a > fallback. > > return hadoop.fetch(uri); > > ``` > > More importantly, are there really file permission issues where "cp" > doesn't work but "hadoop fs copyToLocal" works? Can you give me a concrete > example? I ask because if the hadoop client hangs for whatever reason (we > have seen this in production at Twitter), it will take upto the executor > registration timeout for the slave to kill the executor and transition the > task to LOST. Getting rid of hadoop dependency was one of the reasons we > install our executors locally. This change adds the dependency back on > hadoop, even for local files, which is unfortunate. > > Ankur Chauhan wrote: > That does make sense, I hadn't considered the logging :-( and I am all up > for no hard external dependencies if I can avoid it. Let me think a little on > that.
The latest patch should take care of this. I am following the same approach as above. - Ankur ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27483/#review59842 ----------------------------------------------------------- On Nov. 5, 2014, 6:39 a.m., Ankur Chauhan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27483/ > ----------------------------------------------------------- > > (Updated Nov. 5, 2014, 6:39 a.m.) > > > Review request for mesos, Timothy Chen, Timothy St. Clair, and Vinod Kone. > > > Bugs: MESOS-1711 > https://issues.apache.org/jira/browse/MESOS-1711 > > > Repository: mesos-git > > > Description > ------- > > Previously, the fetcher used a hardcoded list of schemes to determine what > URIs could be fetched by hadoop (if available). This is now changed such that > we first check if hadoop can fetch them for us and then we fallback to the > os::net and then a local copy method (same as it used to be). This allows > users to fetch artifacts from arbitrary filesystems as long as hadoop is > correctly configured (in core-site.xml). > > > Diffs > ----- > > src/hdfs/hdfs.hpp bbfeddef106c598d8379ced085ef0605c4b2f380 > src/launcher/fetcher.cpp 9323c28237010fa065ef34d74435c151ded530a8 > src/tests/fetcher_tests.cpp d7754009a59fedb43e3422c56b3a786ce80164aa > > Diff: https://reviews.apache.org/r/27483/diff/ > > > Testing > ------- > > make check > sudo bin/mesos-tests.sh --verbose > support/mesos-style.py > > > Thanks, > > Ankur Chauhan > >