I agree it is expensive, and the sole purpose would be "industry standard" - as 
in I can query the hflie/parquet better with any tool.

I cant help but look at what Hudi and delta lake did and feel they are doing 
basically the same things as abase (compaction, wal, etc), but without the 
stronger isolation and performance characteristics.



-----Original Message-----
From: Stack <[email protected]> 
Sent: Tuesday, March 3, 2020 1:59 PM
To: HBase Dev List <[email protected]>
Subject: RE: [EXTERNAL]Parquet vs HFile

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On Mon, Mar 2, 2020 at 10:27 AM Burd, Roni <[email protected]>
wrote:

> Has anyone looked at leveraging Parquet files to replace HFiles? I 
> recognize that HFiles may be more advanced for the hbase case, but my 
> assumption is that Parquet can be evolved as well.
>
> This would also help hfiles align better with a more widely adopted 
> industry standard.
>
> Thoughts?
>

I'd think the mismatch between the formats would be expensive to little benefit 
other than 'industry standard' unless work was done to teach hbase about 
columns at least as far up as the hbase 'block' as described in the 'Ressi data 
layout' in [1].
Thanks,
S

 1. https://dl.acm.org/doi/pdf/10.1145/3035918.3056103

Reply via email to