[ https://issues.apache.org/jira/browse/ARROW-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957733#comment-16957733 ]
Joris Van den Bossche commented on ARROW-6968: ---------------------------------------------- Hi [~mwheeler-hdai], this was a backwards incompatible change in pyarrow 0.15.0. The {{Column}} class (as small wrapper around ChunkedArray) is removed, and a column of a Table is now returned as a {{ChunkedArray}}. In most cases a {{ChunkedArray}} behaves similarly and has similar functionality as a the removed {{Column}}, but one of the differences is that {{ChunkedArray}} has no 'name' attribute. You could replace the {code} map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns} {code} with eg {code} map_col_names_to_incides = {name: i for i, name in enumerate(table.column_names)} {code} as the column_names are guaranteed to be in the correct order (or another option: {{dict(zip(table.column_names, range(table.num_columns)))}}). > [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError > --------------------------------------------------------- > > Key: ARROW-6968 > URL: https://issues.apache.org/jira/browse/ARROW-6968 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.15.0 > Environment: Python 3.7.4 on macOS Mojave 10.14.6 > Python 3.6.7 on Ubuntu 16.04.6 LTS > Reporter: Michael Wheeler > Priority: Major > Attachments: attribute_error_pyarrow_0_15_0.py > > > The code in question: > {code:java} > """ > Reproduce AttributeError with PyArrow == 0.15.0 > """ > import io > import logging > import pandas > import pyarrow > import sys > import textwrap > logging.basicConfig(level=logging.DEBUG) > logging.debug(f'Python > v{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}') > logging.debug(f'PyArrow v{pyarrow.__version__}' + '\n') > CSV_TEXT = textwrap.dedent("""\ > id,gender,some_date,age > 001,M,01/01/2019,75 > 002,F,02/02/2018,32 > 003,M,03/03/2017,27 > 004,F,04/04/2016,19 > 005,M,05/05/2015,55 > 006,F,06/06/2014,42 > """) > # Initialize pyarrow table via pandas > mock_file = io.StringIO(CSV_TEXT) > df = pandas.read_csv(mock_file).sort_values(['age', 'gender']) > table = pyarrow.Table.from_pandas(df=df) > # This comprehension generates a map between the name of the column and its > index > map_col_names_to_incides = {item.name: table.columns.index(item) for item in > table.columns} > logging.debug('The column indices are:') > for name, index in map_col_names_to_incides.items(): > logging.debug(f'Col {name} -> #{index}') > {code} > > Expected result (generated with 0.14.0): > {code:java} > DEBUG:root:Python v3.7.4 > DEBUG:root:PyArrow v0.14.1 > DEBUG:root:The column indices are: > DEBUG:root:Col id -> #0 > DEBUG:root:Col gender -> #1 > DEBUG:root:Col some_date -> #2 > DEBUG:root:Col age -> #3 > DEBUG:root:Col __index_level_0__ -> #4 > {code} > Actual result (generated with 0.15.0): > {code:java} > DEBUG:root:Python v3.7.4 > DEBUG:root:PyArrow v0.15.0 > Traceback (most recent call last): > File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line > 1758, in <module> > main() > File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line > 1752, in main > globals = debugger.run(setup['file'], None, None, is_module) > File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line > 1147, in run > pydev_imports.execfile(file, globals, locals) # execute the script > File > "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", > line 18, in execfile > exec(compile(contents+"\n", file, 'exec'), glob, loc) > File > "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", > line 31, in <module> > map_col_names_to_incides = {item.name: table.columns.index(item) for item > in table.columns} > File > "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", > line 31, in <dictcomp> > map_col_names_to_incides = {item.name: table.columns.index(item) for item > in table.columns} > AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name' > {code} > > This error occurs in both of the environments specified above. -- This message was sent by Atlassian Jira (v8.3.4#803005)