interface to HDFS

Mike Anderson Thu, 21 May 2009 15:16:24 -0700

Hello, I'm working on a hadoop project where my data is comprised of many
HTML files (websites). One aspect of the project involves traditional
MapReduce analysis on the data set, but I would also like to use hadoop as a
sort of "cache server," i.e, having the ability to retrieve the HTML for a
website that I have already been to.


My question is this: what is the best way to interact with HDFS to make
simple existance queries and retrieve specific files for reading. Ideally I
would like to do this at an application level, (most likely written in
Ruby). So far I have explored the option of using one of the FUSE packages
to mount it in the userspace, but, I ran into quite a bit of difficulty
installing either of the two popular packages. My second option seems to be
Hive, but I haven't been able to find any bindings for Ruby or Python, etc.

Any suggestions or advice would be greatly appreciated!

Cheers,
Mike

interface to HDFS

Reply via email to