Till Westmann has posted comments on this change.

Change subject: ASTERIXDB-1102: VarSize Encoding to store length of String and 
ByteArray
......................................................................


Patch Set 8:

(8 comments)

https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/ByteArrayPointable.java
File 
hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/ByteArrayPointable.java:

Line 106:     private int hash = 0;
Could we move those to the beginning of the class (i.e. before the first 
methods/constructor) to make the statefulness for the object more visible?


Line 169:     public static int normalize(byte[] bytesPtr, int start) {
Could you add a comment what "normalization" is in this case?


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java
File 
hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java:

Line 76:     private int stringLength;
I'm not sure that I'm happy that these pointables are stateful now, but if we 
remain careful to avoid concurrent use I think that it should be fine. 
However, I think that it would be good to put the data members to the beginning 
of the class to make it more visible that these objects are stateful.
So I'd put them before the first method.


Line 92:      * Since the {@code utf8Length} and the {@code metaLength} are 
often used, we compute those two value in advance.
s/value/values/


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java
File 
hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java:

Line 49:             int diff = estimateMetaLen - actualMetaLen;
It would be nice if this case was covered in the UTF8StringPointableTest.


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8CharSequence.java
File 
hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8CharSequence.java:

Line 55:             buf = new char[utfLen];
It seems relatively expensive to always allocate *and* copy the whole string  - 
especially as strings could be really big after this change. 
Thoughts?


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/AbstractUTF8StringBinaryTokenizer.java
File 
hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/AbstractUTF8StringBinaryTokenizer.java:

Line 59:     //TODO: This UTF8Tokenizer strongly rely on the Asterix data 
format,
s/rely/relies/


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java
File 
hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java:

Line 214:     public static int normalize(byte[] bytes, int start) {
Could we document what "normalize" means in this case?


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/449
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27
Gerrit-PatchSet: 8
Gerrit-Project: hyracks
Gerrit-Branch: master
Gerrit-Owner: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Ian Maxon <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Preston Carman <[email protected]>
Gerrit-Reviewer: Taewoo Kim <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-HasComments: Yes

Reply via email to