[
https://issues.apache.org/jira/browse/PARQUET-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330424#comment-17330424
]
Gabor Szadovszky commented on PARQUET-2026:
-------------------------------------------
[~vitalii], thanks for the explanation. I still think that an empty table
should not require an empty parquet file to be created. Meanwhile, I am not
against allowing to create an empty parquet file but we have to investigate
this carefully. Is the format itself allow to logically create an empty file?
E.g. what should be the accepted value for data/dictionary page offsets? (These
are required fields.) If we think the format allows this we shall write proper
unit tests in parquet-mr to ensure we can handle empty files in any
scenarios/with any bindings. Even though it is a regression we could not catch
it because we did not have any unit tests for it. I think, the ability to
create empty files was more a hidden feature than an intentional one. If we
re-introduce this feature we shall do it properly.
> Allow empty row in parquet file
> -------------------------------
>
> Key: PARQUET-2026
> URL: https://issues.apache.org/jira/browse/PARQUET-2026
> Project: Parquet
> Issue Type: Task
> Components: parquet-mr
> Affects Versions: 1.12.0
> Reporter: Vitalii Diravka
> Priority: Major
> Labels: Drill, empty-file
> Fix For: 1.13.0
>
> Attachments: Screenshot from 2021-04-13 08-52-56.png
>
>
> PARQUET-1851 starts abandon to write parquet files with schema (meta
> information), but with 0 rows, aka empty files.
> In result it prevent to store empty tables in DRILL by using parquet files,
> for example:
> {code:java}
> CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0{code}
> {code:java}
> CREATE TABLE dfs.tmp.%s AS select * from
> dfs.`parquet/alltypes_required.parquet` where `col_int` = 0{code}
> {code:java}
> create table dfs.tmp.%s as select * from
> dfs.`parquet/empty/complex/empty_complex.parquet`{code}
> So PARQUET-1851 breaks the following test cases:
> {code:java}
> TestUntypedNull.testParquetTableCreation
> TestParquetWriterEmptyFiles.testComplexEmptyFileSchema
> TestParquetWriterEmptyFiles.testWriteEmptyFile
> TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema
> TestParquetWriterEmptyFiles.testWriteEmptySchemaChange
> TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable
> TestMetastoreCommands.testSelectEmptyRequiredParquetTable{code}
> I suggest to use warning in the process of creating empty parquet files or
> create alternative _endBlock_ for backward compatibility with other tools:
> !Screenshot from 2021-04-13 08-52-56.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)