We are evaluating Parquet and HBase for storing a dense & very, very wide
matrix (can have more than 600K columns).

I've following questions:

   - Is there is a limit on # of columns in Parquet or HFile? We expect to
   query [10-100] columns at a time using Spark - what are the performance
   implications in this scenario?
   - HBase can support millions of columns - anyone with prior experience
   that compares Parquet vs HFile performance for wide structured tables?
   - We want a schema-less solution since the matrix can get wider over a
   period of time
   - Is there a way to generate wide structured schema-less Parquet files
   using map-reduce (input files are in custom binary format)?

What other solutions other than Parquet & HBase are useful for this
use-case?

Reply via email to