[ 
https://issues.apache.org/jira/browse/BEAM-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122006#comment-17122006
 ] 

Kenneth Knowles commented on BEAM-7496:
---------------------------------------

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> python sdk fail to read data from HDFS
> --------------------------------------
>
>                 Key: BEAM-7496
>                 URL: https://issues.apache.org/jira/browse/BEAM-7496
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-hadoop
>    Affects Versions: 2.12.0
>            Reporter: Mingliang Qi
>            Assignee: Udi Meiri
>            Priority: P2
>              Labels: stale-assigned
>
> There was an index bug in hadoopfilesystem.py 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/hadoopfilesystem.py#L72]
> The end index should not be included according to the inherited function 
> description: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystemio.py#L51]
> This will leads to following Error message during runtime:
> {code:java}
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 
> 406, in run
>     self._options).run(False)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 
> 419, in run
>     return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/direct_runner.py",
>  line 132, in run_pipeline
>     return runner.run_pipeline(pipeline, options)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 276, in run_pipeline
>     default_environment=self._default_environment))
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 280, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 356, in run_stages
>     stage_context.safe_coders)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 511, in run_stage
>    data_input, data_output)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1217, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 832, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 312, in do_instruction
>     request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 331, in process_bundle
>     bundle_processor.process_bundle(instruction_id))
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 554, in process_bundle
>     ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 140, in process_encoded
>     self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 245, in 
> apache_beam.runners.worker.operations.Operation.output
>   File "apache_beam/runners/worker/operations.py", line 246, in 
> apache_beam.runners.worker.operations.Operation.output
>   File "apache_beam/runners/worker/operations.py", line 142, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive
>   File "apache_beam/runners/worker/operations.py", line 396, in 
> apache_beam.runners.worker.operations.ImpulseReadOperation.process
>   File "apache_beam/runners/worker/operations.py", line 398, in 
> apache_beam.runners.worker.operations.ImpulseReadOperation.process
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/concat_source.py", 
> line 86, in read
>     range_tracker.sub_range_tracker(source_ix)):
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", 
> line 183, in read_records
>     record = _TFRecordUtil.read_record(file_handle)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", 
> line 123, in read_record
>     buf = file_handle.read(buf_length_expected)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystem.py", 
> line 261, in read
>     self._fetch_to_internal_buffer(num_bytes)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystem.py", 
> line 210, in _fetch_to_internal_buffer
>     buf = self._file.read(self._read_size)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py", line 
> 112, in readinto
>     b[:len(data)] = data
> ValueError: cannot modify size of memoryview object
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to