Hi all,

Wanted to share some prototyping I was doing for HUDI-46. The idea here is
to see if we can embed a parquet file "inline" into an outer file (our
log), so that if the user chooses to they can also get parquet data in the
logs to speed up real-time view queries. We would be using the standard
ParquetWriter and ParquetReader on top of a custom FileSystem
implementation.

https://github.com/vinothchandar/incubator-hudi/commit/c60f4578f794d0f0d0e194b3e509cc0c5f132576

Wrote a small PoC with TODOs and gaps annotated. Wanted to see if you all
can poke more holes here and see if can generalize to embedding any file
for e.g HFile..

I believe we can generalize it and thus build things like external indexing
very easily on the existing log format.

Thanks
Vinoth

Reply via email to