Why use storm, unless you need to transform the data in a global way (aka 
streaming join or per-agragation) and the data is too big for a single box to 
handle, or this is going to be an ongoing process.  Just write a little app 
that takes a file on the command line and uploads it to HBase.  Then you can 
use find and xargs to upload them in parallel
find ./dir/to/data -type f | xargs -n 1 -P <NUM_IN_PARALLEL> <COMMAND_TO_UPLOAD>
If you need multiple machines to do this, then launch subsets of the data on 
multiple different machines.  Storm just feels like overkill for a simple 
upload.

- Bobby
 


     On Tuesday, June 9, 2015 11:32 AM, "Rajabhathor, Selvaraj (Contractor)" 
<[email protected]> wrote:
   

 Hi

I am working on a POC to migrate current unix based files into HBase .

We believe we can use Storm to migrate these files - I have successfully 
implemented a POC to read one file and emit it across a test Topology that I 
dfined.
My next goal is to sort of "loop through' a directory and emit several files in 
a similar fashion.

How can I implement such a Topology using Storm?

Thanks
Raj


Regards,
Raj Rajabhathor
Big Data Architect,
Capco Contractor @ FannieMae<mailto:Contractor@FannieMae(917)952>
(917)952<mailto:Contractor@FannieMae(917)952>-5597 (cell)
703-833-2539 (direct)


This e-mail and its attachments are confidential and solely for the intended 
addressee(s). Do not share or use them without Fannie Mae's approval. If 
received in error, delete them and contact the sender.

  

Reply via email to