[ 
https://issues.apache.org/jira/browse/BEAM-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664147#comment-16664147
 ] 

Chamikara Jayalath commented on BEAM-5856:
------------------------------------------

I think it's possible to add support for UTF-16 encoding to existing text 
source without too many changes.

 

Basically we have to make sure that we extract encoded bytes per line correctly 
and that we decode the bytes correctly using a proper coder. Also we have to 
increase the test coverage to make sure that we parse data without losing 
records.

> Read UTF-16 CSV and Text files
> ------------------------------
>
>                 Key: BEAM-5856
>                 URL: https://issues.apache.org/jira/browse/BEAM-5856
>             Project: Beam
>          Issue Type: Wish
>          Components: sdk-py-core
>            Reporter: Paul Velthuis
>            Assignee: Ahmet Altay
>            Priority: Major
>
> At this moment it is not possible for Apache Beam Python version 2.7 to read 
> UTF-16 text files or csv files. I would like to have the possibility to read 
> this kind of unicode. Currently the issue that UTF-16 cannot be read is 
> described in: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py]
>  
> If somebody has an idea to fix it I would be more than happy to assist.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to