clintropolis commented on code in PR #17460:
URL: https://github.com/apache/druid/pull/17460#discussion_r1841079053
##########
processing/src/main/java/org/apache/druid/common/utils/SerializerUtils.java:
##########
@@ -224,4 +239,83 @@ public int getSerializedStringByteSize(String str)
{
return Integer.BYTES + StringUtils.toUtf8(str).length;
}
+
+
+ /**
+ * Write a {@link Serializer} to a {@link Path} and then map it as a
read-only {@link MappedByteBuffer}
+ */
+ @SuppressFBWarnings("NP_NONNULL_PARAM_VIOLATION")
Review Comment:
the answer to this is a bit complicated, yea some serializers do not require
a `FileSmoosher` (basically everything that does not support "large" files
greater than 2gb does not need a smoosher). Technically none of the current
callers need a smoosher since the new caller is just taking an `IntBuffer` and
persisting/mapping it, and the other callers are using `FixedIndexed` which
does not support large files yet, but accepting a `Serializer` here is kind of
risky for future misuse.
I think I like two options here.
One option is that `mapSerializer` just makes its own
`FileSmoosher`/`SmooshedWriter` and just write the serializer into a temporary
smoosh at the same name as the temp file name, and then just return the loaded
smoosh instead of a mapped byte buffer directly. This is actually already what
`DictionaryIdLookup` (where i refactored this code from) is doing for string
dictionaries,
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/nested/DictionaryIdLookup.java#L235
and would let it work with any serializer implementation, and presumably the
thing we want to read from the mapped file knows how to load other stuff from
the smoosher.
The other option that seems worth considering is as you said, instead of
passing in `Serializer` just pass in some other interface that writes with just
`WriteableByteChannel` and a length argument with the expectation that its only
going to write to the channel and then map the byte buffer and it fits in the
buffer. This solves the immediate use case I think, but is sort of limited as
long as we are stuck with `ByteBuffer`.
I think I'm sort of leaning towards the 'make a new smoosh' option since it
seems the most flexible, its like "make a smoosh file that contains only the
contents of this serializer", it does have a little extra overhead for things
that do fit in a single buffer, but doesn't seem too bad. Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]