[ 
https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-431:
--------------------------------
    Description: 
We have a basic implementation of inline filesystem, to read a file format like 
Parquet, embedded "inline" into another file.  

[https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java]
 for sample usage.

 This idea here is to see if we can embed parquet/hfile formats into the Hudi 
log files, to get columnar reads on the delta log files as well. This helps us 
speed up query performance, given the log is row based today. Once Inline FS is 
available, enable parquet logging support with HoodieLogFile. LogFile can 
expose a writer (essentially ParquetWriter) and users can write records as 
though writing to parquet files. Similarly on the read path, a reader 
(parquetReader) will be exposed which the user can use to read data out of it. 

This Jira tracks work to implement such parquet inlining into the log format 
and have the writer and reader use it. 

 

  was:Once Inline FS is available, enable parquet logging support with 
HoodieLogFile. LogFile can expose a writer (essentially ParquetWriter) and 
users can write records as though writing to parquet files. Similarly on the 
read path, a reader (parquetReader) will be exposed which the user can use to 
read data out of it. 


> Design and develop parquet logging in Log file
> ----------------------------------------------
>
>                 Key: HUDI-431
>                 URL: https://issues.apache.org/jira/browse/HUDI-431
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Storage Management
>            Reporter: sivabalan narayanan
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: help-requested
>
> We have a basic implementation of inline filesystem, to read a file format 
> like Parquet, embedded "inline" into another file.  
> [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java]
>  for sample usage.
>  This idea here is to see if we can embed parquet/hfile formats into the Hudi 
> log files, to get columnar reads on the delta log files as well. This helps 
> us speed up query performance, given the log is row based today. Once Inline 
> FS is available, enable parquet logging support with HoodieLogFile. LogFile 
> can expose a writer (essentially ParquetWriter) and users can write records 
> as though writing to parquet files. Similarly on the read path, a reader 
> (parquetReader) will be exposed which the user can use to read data out of 
> it. 
> This Jira tracks work to implement such parquet inlining into the log format 
> and have the writer and reader use it. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to