Jianfeng Jia has posted comments on this change. Change subject: ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray ......................................................................
Patch Set 5: (8 comments) https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java: Line 65: public static UTF8StringPointable generateUTF8Pointable(String testString) { > Is there a reason here this parameter is named testString? Done Line 316: GrowableArray out) throws IOException { > I agree with your comments here. If we throw an RTE when this happens, is i I guess we don't have the "warning" mechanism in the system now. So I will stick to the original implementation. https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/marshalling/UTF8StringSerializerDeserializer.java File hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/marshalling/UTF8StringSerializerDeserializer.java: Line 35: > Ah ok, I think this was the Singleton thing you were referring to. Is there Good point! There is no reason to have such confusing method name anymore. I replaced the INSTANCE method to a public constructor https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/bytes/Base64Parser.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/bytes/Base64Parser.java: Line 23: > So one question I have, is that is there no allocation-free library we migh I searched for a while, it seems our only-generate-once object program style is really odd for the utility package. I guess most of the utilities need to considerate the multi-thread scenario. While in our style none of our utility is thread-safe. Instead, we rely on the outside factory pattern to make sure there is one instance per thread. Given that, sadly it really hard to directly reuse the 3rd party libraries I'm really curious about the initial experiment to compare the overhead of this object generation. Line 106: */ > I guess I can't think of a clean way to avoid having this method duplicate I can't think a nice way without creating any temporary buffer neither. I split the fillup quadruplet part into a separate function to reduce the redundancy a little. https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/encoding/VarLenIntEncoderDecoder.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/encoding/VarLenIntEncoderDecoder.java: Line 133: > Is there a reason this is commented out? Is it just not used? The comment is originally here which shows a complementary way of encoding and decoding, it might be used in the future. https://asterix-gerrit.ics.uci.edu/#/c/449/5/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java: Line 27: > Note from discussion: This is really just migrated from asterix-fuzzyjoin a Done Line 143: // a little speedup for ascii chars > From discussion, let's just cut this and always return Character.toLowerCas Done -- To view, visit https://asterix-gerrit.ics.uci.edu/449 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27 Gerrit-PatchSet: 5 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Preston Carman <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-HasComments: Yes
