>From Michael Blow <[email protected]>: Michael Blow has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064 )
Change subject: [NO ISSUE][HYR][STO] Transform long filenames / paths ...................................................................... Patch Set 6: Code-Review+1 (10 comments) https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@59 PS4, Line 59: try { : registry.load(new InputStreamReader(new FileInputStream(registryFile), StandardCharsets.UTF_8)); > try with resources for the streams? Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@84 PS4, Line 84: if (!candidate.equals(finalPath)) { > why candidate and not originalpath? we only store in the registry if length > needs to be translated? i went back & forth on this, but decided to keep the registry minimally-sized- it wouldn't take much convincing to flop back- the trade-off is registry size -vs- the one-time re-evaluation of a "safe" path each startup of the NC- it didn't seem too bad to do the duplicate work, since it's only once per boot... https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@118 PS4, Line 118: replacement += nextFixupToken(); > javac translates this sorta stuff into stringbuilder now right? yeah https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@131 PS4, Line 131: try { : registry.store(new FileWriterWithEncoding(registryFile, StandardCharsets.UTF_8), : "Generated by " + getClass().getName()); > try with resources? also i think this does flush but is that sufficient to > ensure durability? Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@180 PS4, Line 180: "~" > lol MICROS~1 Ack https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@184 PS4, Line 184: return utf8Length(file.getAbsolutePath()); > what if the path is not normalized? Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@187 PS4, Line 187: protected static int utf8PathLength(Path pathSegment) { > unused? Done https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/SafePathTransformer.java@198 PS4, Line 198: return element.substring(0, pathSegmentBytesLimit - 10 - MAX_FIXUP_LENGTH) + "..." : + element.substring(element.length() - 10); > doesn't there need to be some iteration here to figure out how many chars to > remove to scrape under […] no, we only patching one segment here- so we know that we will now be under the limit with the substrings + the fixup we add after we return https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/UnixPathTransformerFactory.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/UnixPathTransformerFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/UnixPathTransformerFactory.java@34 PS4, Line 34: private static final int PATH_OVERALL_BYTES_LIMIT = Integer : .getInteger(UnixPathTransformerFactory.class.getName() + ".pathOverallBytesLimit", Integer.MAX_VALUE); > from poking around in google it seems like this is 4096 in linux (at least as > a lower bound)? but yet it's not, in actuality- linux has this compiled into the headers but most (all?) filesystems don't have any such limit -- not really sure what to do here, but i guess conservative is safest https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/UnixPathTransformerFactory.java@43 PS4, Line 43: > isn't NUL (\0) usually illegal? do we allow NUL \0 in our parser? i guess i didn't try- i updated this to replace -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10064 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: cheshire-cat Gerrit-Change-Id: I26d7223611cc68e0f99c9c9eead9c0a344a053e7 Gerrit-Change-Number: 10064 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Blow <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Murtadha Hubail <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-CC: Ian Maxon <[email protected]> Gerrit-Comment-Date: Fri, 26 Feb 2021 20:50:02 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: Yes Comment-In-Reply-To: Ian Maxon <[email protected]> Gerrit-MessageType: comment
