Ethan Setnik created SAMZA-171:
----------------------------------
Summary: Http Yarn package path failing
Key: SAMZA-171
URL: https://issues.apache.org/jira/browse/SAMZA-171
Project: Samza
Issue Type: Bug
Reporter: Ethan Setnik
When specifying an http package path for yarn jobs the jobs fail with the error:
14/03/05 16:28:40 WARN security.UserGroupInformation:
PriviledgedActionException as:samza (auth:SIMPLE) cause:java.io.IOException: No
FileSystem for scheme: http
14/03/05 16:28:40 INFO localizer.ResourceLocalizationService: DEBUG: FAILED {
http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz, 0,
ARCHIVE, null }, No FileSystem for scheme: http
14/03/05 16:28:40 INFO localizer.LocalizedResource: Resource
http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz
transitioned from DOWNLOADING to FAILED
14/03/05 16:28:40 INFO container.Container: Container
container_1394035672475_0003_02_000001 transitioned from LOCALIZING to
LOCALIZATION_FAILED
14/03/05 16:28:40 INFO localizer.LocalResourcesTrackerImpl: Container
container_1394035672475_0003_02_000001 sent RELEASE event on a resource request
{ http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz, 0,
ARCHIVE, null } not present in cache.
yarn.package.path=http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz
It looks like some work has already been done to support this feature by
configuring the "fs.http.imp". I also noticed that this configuration was
updated in SAMZA-63.
hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName)
However my understanding is that the job package itself contains the necessary
HttpFileSystem class to load http packages, and YARN does not support this
configuration out of the box so i'm at a loss as to how to load a remote
package over http.
--
This message was sent by Atlassian JIRA
(v6.2#6252)