[ 
https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032838#comment-16032838
 ] 

Tim Allison commented on BEAM-2328:
-----------------------------------

bq. The last thing I'd like to investigate for a start is to check what may 
need to be done around non UTF-8 charsets. I don't expect TikaReader producing 
anything else but Strings though.

??? TikaReader should only produce Strings.  Let me know if you're finding 
otherwise! :)

> Introduce Apache Tika Input component
> -------------------------------------
>
>                 Key: BEAM-2328
>                 URL: https://issues.apache.org/jira/browse/BEAM-2328
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas, sdk-java-extensions
>            Reporter: Sergey Beryozkin
>            Assignee: Sergey Beryozkin
>             Fix For: 2.1.0
>
>
> Apache Tika is a popular project that offers an extensive support for parsing 
> the variety of file formats. It is used in many projects including Lucene and 
> Elastic Search. 
> Supporting a Tika Input (Read) at the Beam level would be of major interest 
> to many users.
> PR is to follow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to