lwhite1 commented on code in PR #14213:
URL: https://github.com/apache/arrow/pull/14213#discussion_r979054721
##########
docs/source/java/vector.rst:
##########
@@ -268,6 +268,82 @@ For example, the code below shows how to build a
:class:`ListVector` of int's us
}
}
+Dictionary Encoding
+===================
+
+A :class:`FieldVector` can be dictionary encoded for performance or improved
memory efficiency. While this is most often done with :class:`VarCharVector`,
nearly any type of vector might be encoded if there are many values, but few
unique values.
+
+There are a few steps involved in the encoding process:
+
+1. Create a regular, un-encoded vector and populate it
+2. Create a dictionary vector of the same type as the un-encoded vector. This
vector must have the same values, but each unique value in the un-encoded
vector need appear here only once.
+3. Create a :class:`Dictionary`. It will contain the dictionary vector, plus a
:class:`DictionaryEncoding` object that holds the encoding's metadata and
settings values.
+4. Create a :class:`DictionaryEncoder`.
+5. Call the encode() method on the :class:`DictionaryEncoder` to produce an
encoded version of the original vector.
+6. (Optional) Call the decode() method on the encoded vector to re-create the
original values.
+
+The encoded values will be integers. Depending on how many unique values you
have, you can use either TinyIntVector, SmallIntVector, or IntVector to hold
them. You specify the type when you create your :class:`DictionaryEncoding`
instance. You might wonder where those integers come from: the dictionary
vector is a regular vector, so the value's index position in that vector is
used as its encoded value.
Review Comment:
FWIW, I had no idea it was supposed to link. I've never worked with rst
before. The highlighting is nice, though.
I can add the class markup
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]