guanziyue commented on issue #2648:
URL: https://github.com/apache/hudi/issues/2648#issuecomment-794236129
Hi
Happy to see your reply. I'd like to share more information about that.
After I posted this, I discovered more with help from my colleague. To
reproduce this, we need an index which returns true when canIndexLogFiles() is
called, such as HbaseIndex. At such time, DeltaCommitActionExecutor will try to
append insert records to a log file rather than create a parquet base file as
code shows.
[https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/DeltaCommitActionExecutor.java#L94](url)
This is how a fileGroup without parquet base file produced. However, with my
limited knowledge about data source, it seems that dataSource assumes every
fileGroup has a parquet base file and all log files are appended to the base
file. I guess this may be the root of error.
I plan to try if making canIndexLogFiles() return false can avoid this
problem temporarily while the other way I can com up with now is to generate a
parquet file when inserting records.
Could you please correct me if I made some mistake?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]