>From Wail Alkowaileet <[email protected]>:

Attention is currently required from: [email protected].
Wail Alkowaileet has posted comments on this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209 )

Change subject: [WIP] Support COPY TO in parquet
......................................................................


Patch Set 30:

(19 comments)

File asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/53d9e4de_ddeb6f11
PS30, Line 307: Units
unit


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/db93f1b0_788f5ae5
PS30, Line 307: , given
. Provided


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/d535f847_8ea27207
PS30, Line 308: Unsupported compression scheme
Remove and use 1096


File 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/ExternalDataConstants.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/32ef9cae_a4fd7021
PS30, Line 87: 1MB
Something could be off here. Both page size and row group size are the same?


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/6739fe02_2b9a9121
PS30, Line 315:     public static final String KEY_COMPRESSION_LZO = "lzo";
              :     public static final String KEY_COMPRESSION_LZ4_RAW = 
"lz4_raw";
              :     public static final String KEY_COMPRESSION_BROTLI = 
"brotli";
Can you do a simple benchmark that compares both the size and write throughput? 
If I remember correctly, zstd and snappy give the best compression and time.

*Can be done later


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/05282866_728771bb
PS30, Line 331: JSON_WRITER_SUPPORTED_COMPRESSION
TEXTUAL_WRITER_SUPPORTED_COMPRESSION
'Those' will be used for CSV as well


File 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/WriterValidationUtil.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/4eb4c84a_5f2e0393
PS30, Line 115: validateJSONCompression
validateTextualCompression


File asterixdb/asterix-om/pom.xml:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/367fa1de_01cb992b
PS30, Line 163:       <dependency>
Fix indentation


File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/ParquetRecordLazyVisitor.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/c154fefa_39728e4c
PS30, Line 55: ARecordType
This will fail if it is ANY. Because ANY isn't of type ARecordType
https://github.com/apache/asterixdb/blob/6d124481c4238b59b754eec83811c28596bb6d29/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/types/BuiltinType.java#L970


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/7d257cdf_73fe1cbd
PS30, Line 82: throw new 
HyracksDataException(ErrorCode.TUPLE_DOES_NOT_AGREE_WITH_GIVEN_SCHEMA
You cannot assume any exception is a typing error. For example, what if 
getOrCreateFieldNameIndex() is the one that throws an exception? Also, the 
message does not tell me anything (I have no clue what's the offending type). 
As a user, I would need more information about the offending type and what the 
actual data type is.

You need to make sure 'Type type' is a group type and it corresponds to an 
object. Take a look at line 90. It is exactly the same as this line (i.e., 67). 
How do you know if 'type' corresponds to an object or an array? You need to 
more digging in 'type' to ensure that the type is an array or object

See:
https://github.com/apache/asterixdb/blob/6d124481c4238b59b754eec83811c28596bb6d29/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/hdfs/parquet/AsterixTypeToParquetTypeVisitor.java#L195


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/94d36201_2a7b01e5
PS30, Line 90: asGroupType
See the comment above


File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/om/pointables/printer/parquet/ParquetRecordVisitorUtils.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/ecb8211d_4b436765
PS30, Line 60:             case BOOLEAN:
             :             case BINARY:
             :             case FIXED_LEN_BYTE_ARRAY:
             :             case INT96:
remove. 'default' suffices.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/a31913e3_e807657d
PS30, Line 65: HyracksDataException
throw RuntimeDataException.create(ErrorCode.TYPE_MISMATCH_GENERIC, ATypeTag, 
primitiveTypeName)


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/331bff3c_9e30dc57
PS30, Line 176:            switch (primitiveTypeName) {
              :                     case BOOLEAN:
              :                         recordConsumer.addBoolean(booleanValue);
              :                         break;
              :                     case BINARY:
              :                     case INT32:
              :                     case INT64:
              :                     case FLOAT:
              :                     case DOUBLE:
              :                     case FIXED_LEN_BYTE_ARRAY:
              :                     case INT96:
              :                     default:
              :                         throw new HyracksDataException(
              :                                 "Typecast impossible from " + 
typeTag + " to " + primitiveTypeName);
              :                 }
Replace with if.
if(!BOOLEAN.eqauls(primitiveTypeName) {
   throw ....
}
boolean booleanValue = ABooleanSerializerDeserializer.getBoolean(b, s);
recordConsumer.addBoolean(booleanValue);


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/69503f04_82051ef5
PS30, Line 200:           case BINARY:
              :             case UUID:
              :             case POINT:
              :             case DURATION:
              :             case POINT3D:
              :             case ARRAY:
              :             case MULTISET:
              :             case OBJECT:
              :             case SPARSOBJECT:
              :             case UNION:
              :             case ENUM:
              :             case TYPE:
              :             case ANY:
              :             case LINE:
              :             case POLYGON:
              :             case CIRCLE:
              :             case RECTANGLE:
              :             case INTERVAL:
              :             case SYSTEM_NULL:
              :             case YEARMONTHDURATION:
              :             case DAYTIMEDURATION:
              :             case SHORTWITHOUTTYPEINFO:
remove


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/084016dd_93e22e1d
PS30, Line 222: NULL
We should be able to handle NULL if the schema said the value is optional.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/86baa3e7_7ab0ddde
PS30, Line 223:             case GEOMETRY:
              :             case UINT8:
              :             case UINT16:
              :             case UINT32:
              :             case UINT64:
              :             case BITARRAY:
remove


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/6c76dedf_9814a3b3
PS30, Line 229: MISSING
Could treated as NULL


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209/comment/b35c37cb_bc7ed7f7
PS30, Line 230:     case DATETIME:
remove



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18209
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I40dc16969e66af09cde04b460f441af666b39d51
Gerrit-Change-Number: 18209
Gerrit-PatchSet: 30
Gerrit-Owner: [email protected]
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: [email protected]
Gerrit-CC: Wail Alkowaileet <[email protected]>
Gerrit-Attention: [email protected]
Gerrit-Comment-Date: Mon, 29 Apr 2024 19:56:33 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Reply via email to