[ https://issues.apache.org/jira/browse/ARROW-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609607#comment-17609607 ]
Chang She commented on ARROW-17834: ----------------------------------- One additional tricky thing here is what if the storage array also need additional arguments. Most open-source datasets in computer vision has a predetermined dictionary, so often-times you'd want read in a CSV data dictionary and pass in the class names in the right order to construct the storage DictionaryArray. > [Python] Allow creating ExtensionArray through pa.array(..) constructor > ----------------------------------------------------------------------- > > Key: ARROW-17834 > URL: https://issues.apache.org/jira/browse/ARROW-17834 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Joris Van den Bossche > Priority: Major > > Currently, creating an ExtensionArray from a python sequence (or numpy array, > ..) requires the following: > {code:python} > from pyarrow.tests.test_extension_type import IntegerType > storage_array = pa.array([1, 2, 3]) > ext_arr = pa.ExtensionArray.from_storage(IntegerType(), storage_array) > {code} > While doing this directly in {{pa.array(..)}} doesn't work: > {code:python} > >>> pa.array([1, 2, 3], type=IntegerType()) > ArrowNotImplementedError: extension > {code} > I think it should be possible to basically to the ExtensionArray.from_storage > under the hood in {{pa.array(..)}} when the specified type is an extension > type? > I think this should also enable converting from a pandas DataFrame (with a > column with matching storage values) to a Table with a specified schema that > includes an extension type. Like: > {code} > df = pd.DataFrame({'a': [1, 2, 3]}) > pa.table(df, schema=pa.schema([('a', IntegerType())])) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)