Hi Adam, Yeah, I want to do something something along those lines, but the hive > distribution I am using (amazon's) is mangling file names to the point > that I can't fetch additional libraries. This makes grabbing the > required perl module a bit challenging. >
Apache Hive's 'add' command does not yet support referencing files in HDFS or S3. You can only reference files in the local file system. The ability to reference S3 files in EMR Hive is a feature that the folks at Amazon added, and since this feature hasn't been open sourced I can't explain its odd behavior. Probably the best place for questions about EMR Hive is the Amazon Web Services user forums, where I noticed you already posted a question :) > Can you rename a file on the local filesystem after issuing an add > file command? Something along the files of: add file > s3://bucket/file.pm#file.pm? > On Apache Hive the command 'add file /foo/bar/baz.txt' causes hive to check that /foo/bar/baz.txt exists, and to save the path to a list of resources that should be added to the DistributedCache whenever a query is executed. Hive expects /foo/bar/baz.txt to exist as long as the path appears in the output of 'list FILE'. There is not support in Apache Hive for renaming files when they are placed in the DistributedCache. Hope this helps. Carl
