jorisvandenbossche commented on a change in pull request #1: URL: https://github.com/apache/arrow-cookbook/pull/1#discussion_r686021413
########## File path: python/source/create.rst ########## @@ -0,0 +1,111 @@ +====================== +Creating Arrow Objects +====================== + +Recipes related to the creation of Arrays, Tables, +Tensors and all other Arrow entities. + +.. contents:: + +Create Table from Plain Types +============================= + +Arrow allows fast zero copy creation of arrow arrays +from numpy and pandas arrays and series, but it's also +possible to create Arrow Arrays and Tables from +plain Python structures. + +the :func:`pyarrow.table` function allows creation of Tables +from a variety of inputs, including plain python objects + +.. testcode:: + + import pyarrow as pa + + table = pa.table({ + "col1": [1, 2, 3, 4, 5], + "col2": ["a", "b", "c", "d", "e"] + }) + + print(table) + +.. testoutput:: + + pyarrow.Table + col1: int64 + col2: string + +.. note:: + + All values provided in the dictionary will be passed to + :func:`pyarrow.array` for conversion to Arrow arrays, + and will benefit from zero copy behaviour when possible. + +Store Categorical Data +====================== + +Arrow provides the :class:`pyarrow.DictionaryArray` type +to represent categorical data without the cost of +storing and repeating the categories over and over. This can reduce memory use +when columns might have large values (such as text). + +If you have an array containing repeated categorical data, +it is possible to convert it to a :class:`pyarrow.DictionaryArray` +using :meth:`pyarrow.Array.dictionary_encode` + +.. testcode:: + + arr = pa.array(["red", "green", "blue", "blue", "green", "red"]) + + categorical = arr.dictionary_encode() + print(categorical) + +.. testoutput:: + + ... + -- dictionary: + [ + "red", + "green", + "blue" + ] + -- indices: + [ + 0, + 1, + 2, + 2, + 1, + 0 + ] + +If you already know the categories and indices then you can skip the encode +step and directly create the ``DictionaryArray`` using +:meth:`pyarrow.DictionaryArray.from_arrays` Review comment: Might also be worth mentioning that pyarrow also supports creating a DictionaryArray directly from a pandas.Categorical, in case you would have that already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org