Jianfeng Jia has posted comments on this change. Change subject: ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray ......................................................................
Patch Set 7: (4 comments) https://asterix-gerrit.ics.uci.edu/#/c/449/7/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringLowercasePointable.java File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringLowercasePointable.java: Line 29: public final class UTF8StringLowercasePointable extends AbstractPointable implements IHashable, IComparable { > Why do we need a new type of pointable for special case of strings? Actually I'm against it also, but I just move it from Asterix level to Hyracks level. If you know the reason of having this special data type, we can have a way to remove it. https://asterix-gerrit.ics.uci.edu/#/c/449/7/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java: Line 177: public int ignoreCaseCompareTo(UTF8StringPointable other) { > Do you think these helper functions are common between all users of Hyracks I think yes. That's why I moved as many as AString functions to UTF8StringPointable. Since Hyracks doesn't have the ARecord or AList like data types, the string join/contat have to stay in Asterix level. In this way, the VXQuery don't need to re-create the similar functions anymore, right? https://asterix-gerrit.ics.uci.edu/#/c/449/7/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/AbstractUTF8StringBinaryTokenizer.java File hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/AbstractUTF8StringBinaryTokenizer.java: Line 59: //TODO: This UTF8Tokenizer strongly rely on the Asterix data format, > I think this would be a requirement to share the use of this function. At l Agreed. Maybe Taewoo knows more about this Tokenizer? Let me ask him about the usage. https://asterix-gerrit.ics.uci.edu/#/c/449/7/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/encoding/VarLenIntEncoderDecoder.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/encoding/VarLenIntEncoderDecoder.java: Line 136: * public void encode(int val) { while(val > ENCODE_MASK) { bytes[pos++] = > Remove if unused. (also has tabs.) Done -- To view, visit https://asterix-gerrit.ics.uci.edu/449 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27 Gerrit-PatchSet: 7 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Preston Carman <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-HasComments: Yes
