>From Wail Alkowaileet <[email protected]>:

Attention is currently required from: [email protected].
Wail Alkowaileet has posted comments on this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209 )

Change subject: [WIP] Support COPY TO in parquet
......................................................................


Patch Set 19:

(8 comments)

File 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/writer/printer/TextualExternalFileParquetPrinter.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/99693160_2792d18b
PS19, Line 56: TextualExternalFileParquetPrinter
Not Textual. Rename to ParquetExternalFilePrinter


File 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/writer/printer/TextualExternalFileParquetPrinterFactory.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/0938e7e9_b8d12485
PS19, Line 26: TextualExternalFileParquetPrinterFactory
Rename to ParquetExternalFilePrinterFactory


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/b2c4a951_3e7af682
PS19, Line 32: Object
Change it to IAType and make it private and final, Do the same whenever the 
typeInfo is Object.


File 
asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/890a9e24_8d9da3c7
PS19, Line 132: sourceType
Cast to IAType


File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/FieldNamesDictionary.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/3a6c3eb1_1da59f2f
PS19, Line 42: getOrCreateFieldNameIndex
Similar to the original implementation, we might have collision. Although I 
haven't seen it at all, let us put a TODO here to address that.


File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/ParquetRecordLazyVisitor.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/2e29c290_c7b2c580
PS19, Line 52:    if (type.getTypeTag() != ATypeTag.OBJECT) {
             :             throw new RuntimeException("Type Unsupported for 
parquet printing");
             :         }
This should be done at ExternalWriterProvider to fail early. Also, what if the 
type is ANY? We should allow ANY as the type can be anything it can be 
determined only at runtime.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/2128509c_6f69e085
PS19, Line 67: GroupType groupType = (GroupType) type
What if type is not a group type? This could happen if the declared schema does 
not conform the actual data. For example, the user has declared 'name' as 
string, but in the data it is '{"first": "John", "last": "Smith"}


File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/ParquetRecordVisitorUtils.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/e43d7c6f_b05e32bc
PS19, Line 139:                 switch (primitiveTypeName) {
              :                     case INT64:
              :                         recordConsumer.addLong(bigIntValue);
              :                         break;
              :                     case FLOAT:
              :                         recordConsumer.addFloat(bigIntValue);
              :                         break;
              :                     case DOUBLE:
              :                         recordConsumer.addDouble(bigIntValue);
              :                         break;
              :                     case INT32:
              :                     case BOOLEAN:
              :                     case BINARY:
              :                     case FIXED_LEN_BYTE_ARRAY:
              :                     case INT96:
              :                     default:
              :                         throw new HyracksDataException(
              :                                 "Typecast impossible from " + 
typeTag + " to " + primitiveTypeName);
              :                 }
Let's extract this to a function and use for all integer variants (byte, short, 
int, and long)

void addInteger(long value, PrimitiveType type) {
       switch (primitiveTypeName) {
                    case INT64:
                        recordConsumer.addLong(value);
                        break;
                    case FLOAT:
                        recordConsumer.addFloat(value);
                        break;
                    case DOUBLE:
                        recordConsumer.addDouble(value);
                        break;
                    case INT32:
                    case BOOLEAN:
                    case BINARY:
                    case FIXED_LEN_BYTE_ARRAY:
                    case INT96:
                    default:
                        throw new HyracksDataException(
                                "Typecast impossible from " + typeTag + " to " 
+ primitiveTypeName);
                }
}



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I40dc16969e66af09cde04b460f441af666b39d51
Gerrit-Change-Number: 18209
Gerrit-PatchSet: 19
Gerrit-Owner: [email protected]
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-CC: Wail Alkowaileet <[email protected]>
Gerrit-Attention: [email protected]
Gerrit-Comment-Date: Tue, 16 Apr 2024 19:38:03 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Reply via email to