[ 
https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339992#comment-17339992
 ] 

Vinoth Chandar commented on HUDI-431:
-------------------------------------

[~szhou] Inlining parquet or HFile within a log block and then accessing it via 
the inline fs, let's use take advantage of all the underlying capabilities of 
the file format without having to read the entire content into memory. For e.g, 
for parquet, we can perform columnar reads, while for HFile point-ish lookups. 

To support inlining parquet, we need to add a new block type like 
HoodieAvroDataBlock, we need a HoodieParquetDataBlock and support a columnar 
read on it using a projection schema. The block type would be encoded in the 
header or format anyway. We don't have to introduce a new format in 
HoodieFileFormat IIUC. 

 You can ignore the code sturcture comments for now. They don't mean literally 
exposing the writer, reader. What we need is to be able to read an inline 
parquet file using a standard parquet reader. The tests for inline filesystem 
should help clarify usage. 

 

> Design and develop parquet logging in Log file
> ----------------------------------------------
>
>                 Key: HUDI-431
>                 URL: https://issues.apache.org/jira/browse/HUDI-431
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Storage Management
>            Reporter: sivabalan narayanan
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: help-requested
>
> We have a basic implementation of inline filesystem, to read a file format 
> like Parquet, embedded "inline" into another file.  
> [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java]
>  for sample usage.
>  This idea here is to see if we can embed parquet/hfile formats into the Hudi 
> log files, to get columnar reads on the delta log files as well. This helps 
> us speed up query performance, given the log is row based today. Once Inline 
> FS is available, enable parquet logging support with HoodieLogFile. LogFile 
> can expose a writer (essentially ParquetWriter) and users can write records 
> as though writing to parquet files. Similarly on the read path, a reader 
> (parquetReader) will be exposed which the user can use to read data out of 
> it. 
> This Jira tracks work to implement such parquet inlining into the log format 
> and have the writer and reader use it. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to