lidavidm commented on code in PR #35920:
URL: https://github.com/apache/arrow/pull/35920#discussion_r1224427836
##########
java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java:
##########
@@ -135,6 +135,7 @@ public void run() throws IOException {
Preconditions.checkState(reader.bytesRead() == writer.bytesWritten());
LOGGER.debug(String.format("Echoed %d records", echoed));
reader.close(false);
+ writer.close();
Review Comment:
Thanks. I think it's reasonable to expect that you must close the writer
anyways (and that you should generally try-with-resources these writers).
##########
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowStreamWriter.java:
##########
@@ -121,4 +129,45 @@ public static void writeEndOfStream(WriteChannel out,
IpcOption option) throws I
protected void endInternal(WriteChannel out) throws IOException {
writeEndOfStream(out, option);
}
+
+ @Override
+ protected void ensureDictionariesWritten(DictionaryProvider provider,
Set<Long> dictionaryIdsUsed)
Review Comment:
Oof, so for each batch we write, we need to re-compare all dictionaries?
That may also be worth noting in the PR description.
##########
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowStreamWriter.java:
##########
@@ -121,4 +129,45 @@ public static void writeEndOfStream(WriteChannel out,
IpcOption option) throws I
protected void endInternal(WriteChannel out) throws IOException {
writeEndOfStream(out, option);
}
+
+ @Override
+ protected void ensureDictionariesWritten(DictionaryProvider provider,
Set<Long> dictionaryIdsUsed)
Review Comment:
That said, this is broadly in line with the C++ implementation:
https://github.com/apache/arrow/blob/766e25440250dde9117e19245389badb5ed7678c/cpp/src/arrow/ipc/writer.cc#L1121
The C++ implementation takes a number of shortcuts to avoid having to
compare the full data buffers on each batch. However, because vectors are
mutable in the Java implementation, we can't easily make the same optimization
here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]