gszadovszky commented on code in PR #3287:
URL: https://github.com/apache/parquet-java/pull/3287#discussion_r2313043047
##########
parquet-cli/pom.xml:
##########
@@ -85,6 +85,26 @@
<artifactId>parquet-avro</artifactId>
<version>${project.version}</version>
</dependency>
+ <dependency>
+ <groupId>org.apache.parquet</groupId>
+ <artifactId>parquet-protobuf</artifactId>
+ <version>${project.version}</version>
+ <classifier>tests</classifier>
+ <scope>test</scope>
+ </dependency>
+ <!-- CatCommandTest (for TestProtobuf) -->
+ <dependency>
+ <groupId>com.google.protobuf</groupId>
+ <artifactId>protobuf-java</artifactId>
+ <version>3.25.6</version>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.parquet</groupId>
+ <artifactId>parquet-protobuf</artifactId>
+ <version>${project.version}</version>
+ <scope>test</scope>
+ </dependency>
Review Comment:
By adding these dependencies in test scope only, wouldn't cause them missing
at the command line execution?
There are two ways we can use the cli. One is containing the "normal" scoped
dependencies for the Hadoop env, and the other is containing the "provided"
scope as well for standalone. I don't think these deps will be added to either
one.
##########
parquet-cli/src/main/java/org/apache/parquet/cli/BaseCommand.java:
##########
@@ -55,14 +56,56 @@
import org.apache.parquet.cli.util.GetClassLoader;
import org.apache.parquet.cli.util.Schemas;
import org.apache.parquet.cli.util.SeekableFSDataInputStream;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.hadoop.ParquetFileReader;
import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.hadoop.example.GroupReadSupport;
import org.slf4j.Logger;
public abstract class BaseCommand implements Command, Configurable {
private static final String RESOURCE_URI_SCHEME = "resource";
private static final String STDIN_AS_SOURCE = "stdin";
+ /**
+ * Note for dev: Due to legancy reasons, parquet-cli used the avro schema
reader which
+ * breaks for files generated through proto. This logic is in place to
auto-detect such cases
Review Comment:
I'm wondering if this problem is only related to the files created with the
proto binding. I am fine with auto-detecting the proto related files and use
the example binding to read the files automatically, but maybe, it would be a
good idea, to also provide this option for any files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]