[
https://issues.apache.org/jira/browse/PARQUET-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614094#comment-17614094
]
Chris Collins edited comment on PARQUET-2020 at 10/7/22 2:09 PM:
-----------------------------------------------------------------
I asked this question on the Pull request for this as well (
[https://github.com/apache/parquet-mr/pull/888] ) but maybe this is a better
place to ask.
I hope you can point me in the right direction. With the deprecation and
removal of parquet-tools, is there any way in Java code to render a JSON
representation of a parquet record?
We were previously using {{parquet.tools.util.JsonRecordFormatter}} like this.
{code:java}
HadoopInputFile inputFile = HadoopInputFile.fromPath(new Path(filePath),
hadoopConfig);
try (ParquetFileReader reader = ParquetFileReader.open(inputFile))
{
MessageType schema = reader.getFooter().getFileMetaData().getSchema();
JsonRecordFormatter.JsonGroupFormatter formatter =
JsonRecordFormatter.fromSchema(schema);
PageReadStore pages;
while ((pages = reader.readNextRowGroup()) != null)
{
long rows = pages.getRowCount();
MessageColumnIO columnIO = new
ColumnIOFactory().getColumnIO(schema);
RecordReader<SimpleRecord> recordReader =
columnIO.getRecordReader(pages, new SimpleRecordMaterializer(schema));
for (int i = 0; i < rows; i++)
{
SimpleRecord simpleRecord = (SimpleRecord)
recordReader.read();
System.out.println(formatter.formatRecord(simpleRecord));
}
}
} {code}
Is there anything in the remaining libraries that can achieve this? And if not
could we look at pulling these classes back in to maybe
{{parquet-format-structures}} or some other related project that makes sense?
And if this isn't a place to ask where can I ask?
was (Author: unsta):
I asked this question on the Pull request for this as well (
[https://github.com/apache/parquet-mr/pull/888] ) but maybe this is a better
place to ask.
I hope you can point me in the right direction. With the deprecation and
removal of parquet-tools, is there any way in Java code to render a JSON
representation of a parquet record?
We were previously using {{parquet.tools.util.JsonRecordFormatter}} like this.
{{}}
{code:java}
HadoopInputFile inputFile = HadoopInputFile.fromPath(new Path(filePath),
hadoopConfig);
try (ParquetFileReader reader = ParquetFileReader.open(inputFile))
{
MessageType schema = reader.getFooter().getFileMetaData().getSchema();
JsonRecordFormatter.JsonGroupFormatter formatter =
JsonRecordFormatter.fromSchema(schema);
PageReadStore pages;
while ((pages = reader.readNextRowGroup()) != null)
{
long rows = pages.getRowCount();
MessageColumnIO columnIO = new
ColumnIOFactory().getColumnIO(schema);
RecordReader<SimpleRecord> recordReader =
columnIO.getRecordReader(pages, new SimpleRecordMaterializer(schema));
for (int i = 0; i < rows; i++)
{
SimpleRecord simpleRecord = (SimpleRecord)
recordReader.read();
System.out.println(formatter.formatRecord(simpleRecord));
}
}
} {code}
Is there anything in the remaining libraries that can achieve this? And if not
could we look at pulling these classes back in to maybe
{{parquet-format-structures}} or some other related project that makes sense?
And if this isn't a place to ask where can I ask?
> Remove deprecated modules
> -------------------------
>
> Key: PARQUET-2020
> URL: https://issues.apache.org/jira/browse/PARQUET-2020
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cascading
> Affects Versions: 1.12.0
> Reporter: Fokko Driesprong
> Assignee: Fokko Driesprong
> Priority: Major
> Fix For: 1.13.0
>
>
> Removes:
> * parquet-tools-deprecated
> * parquet-scrooge-deprecated
> * parquet-cascading-common23-deprecated
> * parquet-cascading-deprecated
> * parquet-cascading3-deprecated
--
This message was sent by Atlassian Jira
(v8.20.10#820010)