[
https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-431:
----------------------------
Sprint: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10,
Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-25 (was: Hudi-Sprint-Jan-3,
Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18)
> Support Parquet in MOR log files
> --------------------------------
>
> Key: HUDI-431
> URL: https://issues.apache.org/jira/browse/HUDI-431
> Project: Apache Hudi
> Issue Type: New Feature
> Components: storage-management
> Reporter: sivabalan narayanan
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: help-requested, pull-request-available
> Fix For: 0.11.0
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> We have a basic implementation of inline filesystem, to read a file format
> like Parquet, embedded "inline" into another file.
> [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java]
> for sample usage.
> This idea here is to see if we can embed parquet/hfile formats into the Hudi
> log files, to get columnar reads on the delta log files as well. This helps
> us speed up query performance, given the log is row based today. Once Inline
> FS is available, enable parquet logging support with HoodieLogFile. LogFile
> can expose a writer (essentially ParquetWriter) and users can write records
> as though writing to parquet files. Similarly on the read path, a reader
> (parquetReader) will be exposed which the user can use to read data out of
> it.
> This Jira tracks work to implement such parquet inlining into the log format
> and have the writer and reader use it.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)