[jira] [Commented] (BEAM-9152) Hadoop Downloader Range not correct

Jean-Christophe CARLES (Jira) Thu, 19 May 2022 04:52:04 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539491#comment-17539491
 ]


Jean-Christophe CARLES commented on BEAM-9152:
----------------------------------------------

It looks like patching `apache_beam/io/hadoopfilesystem.py` by updating the 
length parameter of the method `get_range` from  the class `HdfsDownloader` in 
the following way fixes the issue:

```python

length=end - start + 1 --> length=end - start

```

Although it feels weird that this was not noticed more, is anyone using the 
python sdk to read hdfs? Maybe we are doing something wrong...

> Hadoop Downloader Range not correct
> -----------------------------------
>
>                 Key: BEAM-9152
>                 URL: https://issues.apache.org/jira/browse/BEAM-9152
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-tfrecord
>            Reporter: Fangyuan Zhou
>            Priority: P3
>             Fix For: Missing
>
>
> I found that the `HdfsDownloader.get_range(self, start, end)' get range 
> {color:#ff0000}[start, end]{color} rather than {color:#ff0000}[start, 
> end){color}. This will cause an error while reading hdfs file.
>  
> File 
> "/data/anaconda3/envs/tfdv1.15/lib/python3.7/site-packages/apache_beam/io/tfrecordio.py",
>  line 127, in read_record
>  buf = file_handle.read(buf_length_expected)
>  File 
> "/data/anaconda3/envs/tfdv1.15/lib/python3.7/site-packages/apache_beam/io/filesystemio.py",
>  line 123, in readinto
>  b[:len(data)] = data
>  ValueError: memoryview assignment: lvalue and rvalue have different 
> structures



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (BEAM-9152) Hadoop Downloader Range not correct

Reply via email to