[ 
https://issues.apache.org/jira/browse/PARQUET-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614094#comment-17614094
 ] 

Chris Collins edited comment on PARQUET-2020 at 10/7/22 2:08 PM:
-----------------------------------------------------------------

I asked this question on the Pull request for this as well ( 
[https://github.com/apache/parquet-mr/pull/888] ) but maybe this is a better 
place to ask.

 

I hope you can point me in the right direction. With the deprecation and 
removal of parquet-tools, is there any way in Java code to render a JSON 
representation of a parquet record?

We were previously using {{parquet.tools.util.JsonRecordFormatter}} like this.

{{}}
{code:java}
HadoopInputFile inputFile = HadoopInputFile.fromPath(new Path(filePath), 
hadoopConfig);
                
try (ParquetFileReader reader = ParquetFileReader.open(inputFile))
{
        MessageType schema = reader.getFooter().getFileMetaData().getSchema();
        JsonRecordFormatter.JsonGroupFormatter formatter = 
JsonRecordFormatter.fromSchema(schema);
        PageReadStore pages;
                        
        while ((pages = reader.readNextRowGroup()) != null)
        {
                long rows = pages.getRowCount();
                MessageColumnIO columnIO = new 
ColumnIOFactory().getColumnIO(schema);
                RecordReader<SimpleRecord> recordReader = 
columnIO.getRecordReader(pages, new SimpleRecordMaterializer(schema));
        
                for (int i = 0; i < rows; i++)
                {
                        SimpleRecord simpleRecord = (SimpleRecord) 
recordReader.read();
                        
System.out.println(formatter.formatRecord(simpleRecord));
                }
        }
} {code}
 

Is there anything in the remaining libraries that can achieve this? And if not 
could we look at pulling these classes back in to maybe 
{{parquet-format-structures}} or some other related project that makes sense?

And if this isn't a place to ask where can I ask?


was (Author: unsta):
I asked this question on the Pull request for this as well ( 
[https://github.com/apache/parquet-mr/pull/888] ) but maybe this is a better 
place to ask.

 

I hope you can point me in the right direction. With the deprecation and 
removal of parquet-tools, is there any way in Java code to render a JSON 
representation of a parquet record?

We were previously using {{parquet.tools.util.JsonRecordFormatter}} like this.

{{}}
{code:java}
HadoopInputFile inputFile = HadoopInputFile.fromPath(new Path(filePath), 
hadoopConfig);
                
try (ParquetFileReader reader = ParquetFileReader.open(inputFile))
{
        MessageType schema = reader.getFooter().getFileMetaData().getSchema();
        JsonRecordFormatter.JsonGroupFormatter formatter = 
JsonRecordFormatter.fromSchema(schema);
        PageReadStore pages;
                        
        while ((pages = reader.readNextRowGroup()) != null)
        {
                long rows = pages.getRowCount();
                MessageColumnIO columnIO = new 
ColumnIOFactory().getColumnIO(schema);
                RecordReader<SimpleRecord> recordReader = 
columnIO.getRecordReader(pages, new SimpleRecordMaterializer(schema));
        
                for (int i = 0; i < rows; i++)
                {
                        SimpleRecord simpleRecord = (SimpleRecord) 
recordReader.read();
                        
System.out.println(formatter.formatRecord(simpleRecord));
                }
        }
} {code}
{{{}{}}}{{{}{}}}{{}}
Is there anything in the remaining libraries that can achieve this? And if not 
could we look at pulling these classes back in to maybe 
{{parquet-format-structures}} or some other related project that makes sense?

I would have opened this as an issue but I don't see issues enabled for this 
repository.

> Remove deprecated modules
> -------------------------
>
>                 Key: PARQUET-2020
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2020
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cascading
>    Affects Versions: 1.12.0
>            Reporter: Fokko Driesprong
>            Assignee: Fokko Driesprong
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Removes: 
>  * parquet-tools-deprecated
>  * parquet-scrooge-deprecated
>  * parquet-cascading-common23-deprecated
>  * parquet-cascading-deprecated
>  * parquet-cascading3-deprecated



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to