[ 
https://issues.apache.org/jira/browse/ARROW-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957733#comment-16957733
 ] 

Joris Van den Bossche commented on ARROW-6968:
----------------------------------------------

Hi [~mwheeler-hdai], this was a backwards incompatible change in pyarrow 
0.15.0. The {{Column}} class (as small wrapper around ChunkedArray) is removed, 
and a column of a Table is now returned as a {{ChunkedArray}}. In most cases a 
{{ChunkedArray}} behaves similarly and has similar functionality as a the 
removed {{Column}}, but one of the differences is that {{ChunkedArray}} has no 
'name' attribute.

You could replace the

{code}
map_col_names_to_incides = {item.name: table.columns.index(item) for item in 
table.columns}
{code}

with eg

{code}
map_col_names_to_incides = {name: i for i, name in 
enumerate(table.column_names)} 
{code}

as the column_names are guaranteed to be in the correct order (or another 
option: {{dict(zip(table.column_names, range(table.num_columns)))}}).




> [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError
> ---------------------------------------------------------
>
>                 Key: ARROW-6968
>                 URL: https://issues.apache.org/jira/browse/ARROW-6968
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.0
>         Environment: Python 3.7.4 on macOS Mojave 10.14.6
> Python 3.6.7 on Ubuntu 16.04.6 LTS
>            Reporter: Michael Wheeler
>            Priority: Major
>         Attachments: attribute_error_pyarrow_0_15_0.py
>
>
> The code in question:
> {code:java}
> """
> Reproduce AttributeError with PyArrow == 0.15.0
> """
> import io
> import logging
> import pandas
> import pyarrow
> import sys
> import textwrap
> logging.basicConfig(level=logging.DEBUG)
> logging.debug(f'Python 
> v{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')
> logging.debug(f'PyArrow v{pyarrow.__version__}' + '\n')
> CSV_TEXT = textwrap.dedent("""\
>               id,gender,some_date,age
>               001,M,01/01/2019,75
>               002,F,02/02/2018,32
>               003,M,03/03/2017,27
>               004,F,04/04/2016,19
>               005,M,05/05/2015,55
>               006,F,06/06/2014,42
>               """)
> # Initialize pyarrow table via pandas
> mock_file = io.StringIO(CSV_TEXT)
> df = pandas.read_csv(mock_file).sort_values(['age', 'gender'])
> table = pyarrow.Table.from_pandas(df=df)
> # This comprehension generates a map between the name of the column and its 
> index
> map_col_names_to_incides = {item.name: table.columns.index(item) for item in 
> table.columns}
> logging.debug('The column indices are:')
> for name, index in map_col_names_to_incides.items():
>     logging.debug(f'Col {name} -> #{index}')
> {code}
>  
> Expected result (generated with 0.14.0):
> {code:java}
> DEBUG:root:Python v3.7.4
> DEBUG:root:PyArrow v0.14.1
> DEBUG:root:The column indices are:
> DEBUG:root:Col id -> #0
> DEBUG:root:Col gender -> #1
> DEBUG:root:Col some_date -> #2
> DEBUG:root:Col age -> #3
> DEBUG:root:Col __index_level_0__ -> #4
> {code}
> Actual result (generated with 0.15.0):
> {code:java}
> DEBUG:root:Python v3.7.4
> DEBUG:root:PyArrow v0.15.0
> Traceback (most recent call last):
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1758, in <module>
>     main()
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1752, in main
>     globals = debugger.run(setup['file'], None, None, is_module)
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1147, in run
>     pydev_imports.execfile(file, globals, locals)  # execute the script
>   File 
> "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py",
>  line 18, in execfile
>     exec(compile(contents+"\n", file, 'exec'), glob, loc)
>   File 
> "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", 
> line 31, in <module>
>     map_col_names_to_incides = {item.name: table.columns.index(item) for item 
> in table.columns}
>   File 
> "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", 
> line 31, in <dictcomp>
>     map_col_names_to_incides = {item.name: table.columns.index(item) for item 
> in table.columns}
> AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name'
> {code}
>  
> This error occurs in both of the environments specified above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to