>From Wail Alkowaileet <[email protected]>: Attention is currently required from: [email protected]. Wail Alkowaileet has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209 )
Change subject: [WIP] Support COPY TO in parquet ...................................................................... Patch Set 19: (8 comments) File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/writer/printer/TextualExternalFileParquetPrinter.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/99693160_2792d18b PS19, Line 56: TextualExternalFileParquetPrinter Not Textual. Rename to ParquetExternalFilePrinter File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/writer/printer/TextualExternalFileParquetPrinterFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/0938e7e9_b8d12485 PS19, Line 26: TextualExternalFileParquetPrinterFactory Rename to ParquetExternalFilePrinterFactory https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/b2c4a951_3e7af682 PS19, Line 32: Object Change it to IAType and make it private and final, Do the same whenever the typeInfo is Object. File asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/890a9e24_8d9da3c7 PS19, Line 132: sourceType Cast to IAType File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/FieldNamesDictionary.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/3a6c3eb1_1da59f2f PS19, Line 42: getOrCreateFieldNameIndex Similar to the original implementation, we might have collision. Although I haven't seen it at all, let us put a TODO here to address that. File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/ParquetRecordLazyVisitor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/2e29c290_c7b2c580 PS19, Line 52: if (type.getTypeTag() != ATypeTag.OBJECT) { : throw new RuntimeException("Type Unsupported for parquet printing"); : } This should be done at ExternalWriterProvider to fail early. Also, what if the type is ANY? We should allow ANY as the type can be anything it can be determined only at runtime. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/2128509c_6f69e085 PS19, Line 67: GroupType groupType = (GroupType) type What if type is not a group type? This could happen if the declared schema does not conform the actual data. For example, the user has declared 'name' as string, but in the data it is '{"first": "John", "last": "Smith"} File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/ParquetRecordVisitorUtils.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/e43d7c6f_b05e32bc PS19, Line 139: switch (primitiveTypeName) { : case INT64: : recordConsumer.addLong(bigIntValue); : break; : case FLOAT: : recordConsumer.addFloat(bigIntValue); : break; : case DOUBLE: : recordConsumer.addDouble(bigIntValue); : break; : case INT32: : case BOOLEAN: : case BINARY: : case FIXED_LEN_BYTE_ARRAY: : case INT96: : default: : throw new HyracksDataException( : "Typecast impossible from " + typeTag + " to " + primitiveTypeName); : } Let's extract this to a function and use for all integer variants (byte, short, int, and long) void addInteger(long value, PrimitiveType type) { switch (primitiveTypeName) { case INT64: recordConsumer.addLong(value); break; case FLOAT: recordConsumer.addFloat(value); break; case DOUBLE: recordConsumer.addDouble(value); break; case INT32: case BOOLEAN: case BINARY: case FIXED_LEN_BYTE_ARRAY: case INT96: default: throw new HyracksDataException( "Typecast impossible from " + typeTag + " to " + primitiveTypeName); } } -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: I40dc16969e66af09cde04b460f441af666b39d51 Gerrit-Change-Number: 18209 Gerrit-PatchSet: 19 Gerrit-Owner: [email protected] Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-CC: Wail Alkowaileet <[email protected]> Gerrit-Attention: [email protected] Gerrit-Comment-Date: Tue, 16 Apr 2024 19:38:03 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No Gerrit-MessageType: comment
