Ian Maxon has posted comments on this change. Change subject: ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray ......................................................................
Patch Set 5: Code-Review+1 (8 comments) Few thoughts... very cool though overall. I should try this on the data that caused this to be raised in the first place, hopefully it fails in a different way now ;) https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java: Line 65: public static UTF8StringPointable generateUTF8Pointable(String testString) { Is there a reason here this parameter is named testString? Line 316: GrowableArray out) throws IOException { I agree with your comments here. If we throw an RTE when this happens, is it breaking things? If upstream things are relying on undefined behavior we should take note of it while we have the chance. https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/marshalling/UTF8StringSerializerDeserializer.java File hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/marshalling/UTF8StringSerializerDeserializer.java: Line 35: Ah ok, I think this was the Singleton thing you were referring to. Is there a reason here to have this method, then? Maybe I'm being dense but from just looking at it, I don't see the point of having a method that just calls the default constructor, if this is suppose to come out of a Factory instead. https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/bytes/Base64Parser.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/bytes/Base64Parser.java: Line 23: So one question I have, is that is there no allocation-free library we might be able to use for this instead of maintaining our own base64 conversions? An example is what we do with UUIDs, though I think that is maybe a bad one since it isn't allocation free... Line 106: */ I guess I can't think of a clean way to avoid having this method duplicate code off the top of my head, but it kind of feels like there might be? I can think of some ugly things that involved boxed numeric types but obviously we want to avoid that... https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/encoding/VarLenIntEncoderDecoder.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/encoding/VarLenIntEncoderDecoder.java: Line 133: Is there a reason this is commented out? Is it just not used? https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java: Line 27: Note from discussion: This is really just migrated from asterix-fuzzyjoin and hyracks-storage-am-invertedindex Line 143: // a little speedup for ascii chars >From discussion, let's just cut this and always return Character.toLowerCase(); -- To view, visit https://asterix-gerrit.ics.uci.edu/449 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27 Gerrit-PatchSet: 5 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Preston Carman <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-HasComments: Yes
