What kind of features are you looking for?

Hi.

We want to use Hadoop (Streaming) to run some tools to process over 1 million entries per job. Each tool will output one string so we will have 1 mil outputs also. Each string (probably 5KB to 50KB length) will be parsed and from this parsing will result about 25-30 columns). There may be several jobs per day.

We need to collect the output of these tools and store it somewhere for later analysis. The results of one job need to be together - like in one table.

So, we need a DB that can store over one million rows (hmm... or columns?) per table and support some nice (SQL) interrogations. A Hadoop-oriented DB will be nice because it can store safely data (fault tolerant) and because it is distributed we won't have bottlenecks like we have with the current MySQL DB.








Reply via email to