We are evaluating Parquet and HBase for storing a dense & very, very wide matrix (can have more than 600K columns).
I've following questions: - Is there is a limit on # of columns in Parquet or HFile? We expect to query [10-100] columns at a time using Spark - what are the performance implications in this scenario? - HBase can support millions of columns - anyone with prior experience that compares Parquet vs HFile performance for wide structured tables? - We want a schema-less solution since the matrix can get wider over a period of time - Is there a way to generate wide structured schema-less Parquet files using map-reduce (input files are in custom binary format)? What other solutions other than Parquet & HBase are useful for this use-case?
