Hi all, Wanted to share some prototyping I was doing for HUDI-46. The idea here is to see if we can embed a parquet file "inline" into an outer file (our log), so that if the user chooses to they can also get parquet data in the logs to speed up real-time view queries. We would be using the standard ParquetWriter and ParquetReader on top of a custom FileSystem implementation.
https://github.com/vinothchandar/incubator-hudi/commit/c60f4578f794d0f0d0e194b3e509cc0c5f132576 Wrote a small PoC with TODOs and gaps annotated. Wanted to see if you all can poke more holes here and see if can generalize to embedding any file for e.g HFile.. I believe we can generalize it and thus build things like external indexing very easily on the existing log format. Thanks Vinoth