Hi Tim

I just used 'mvn install -DskipTests=true' to quickly build it, and did 'mvn clean install' inside the tika module.

I use Eclipse, Beam docs on how to set up it are good, except that it did not quite work for me yet for all of Beam, only managed to import the individual Tika module
Cheers, Sergey
On 25/05/17 19:30, Allison, Timothy B. wrote:
Awesome!

Any tips on building Beam?  Should it work on (dare I say) Windows?

Intellij is complaining that it can't find jdk.tools:jdk.tools:1.6 as a 
dependency under much of the Hadoop modules.

mvn clean install is failing at Beam::SDKS::Java::Core


[ERROR]   AvroIOTest.testWriteDisplayData:561
Expected: display data with item: (with key is "filePrefix" and with type is <STRING> and 
with value is "/foo")
      but: found 6 non-matching item(s):
<[]org.apache.beam.sdk.io.AvroIO$Write:codec=snappy
[]org.apache.beam.sdk.io.AvroIO$Write:schema=org.apache.beam.sdk.io.AvroIOTest$GenericClass
[]org.apache.beam.sdk.io.AvroIO$Write:fileSuffix=bar
[]org.apache.beam.sdk.io.AvroIO$Write:numShards=100
[]org.apache.beam.sdk.io.AvroIO$Write:shardNameTemplate=-SS-of-NN-
[]org.apache.beam.sdk.io.AvroIO$Write:filePrefix=C:\foo>
[ERROR]   
FileBasedSinkTest.testRemoveWithTempFilename:148->testRemoveTemporaryFiles:261 
temp file C:\Users\tallison\AppData\Local\Temp\junit5212433513605155196\temp\file0 
exists
Expected: is <false>
      but: was <true>
[ERROR]   FileBasedSourceTest.testSplittingFailsOnEmptyFileExpansion
Expected: (an instance of java.io.FileNotFoundException and exception with message a 
string containing "No files found for spec: 
C:\Users\tallison\AppData\Local\Temp\junit1719865221821921346\junit7087025770573441186/missing.txt")
      but: an instance of java.io.FileNotFoundException 
<java.lang.IllegalStateException: Unable to find registrar for c> is a 
java.lang.IllegalStateException
Stacktrace was: java.lang.IllegalStateException: Unable to find registrar for c
         at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
         at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)


among many other errors...
-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]]
Sent: Thursday, May 25, 2017 12:47 PM
To: Allison, Timothy B. <[email protected]>; [email protected]
Subject: Re: Integrating Tika with Apache Beam

Hi Guys

The link to the initial code is available in JIRA, at this stage the focus is 
on preparing a solid initial PR, and then we can all improve Tika related code 
:-)

Cheers, Sergey
On 24/05/17 11:41, Sergey Beryozkin wrote:
Hi Tim, All,

I thought I'd start a dedicated thread.

I added some initial comments to [1], I'm quite close now to creating
the initial PR.

Thanks, Sergey

[1] https://issues.apache.org/jira/browse/BEAM-2328
On 23/05/17 17:42, Allison, Timothy B. wrote:
Another idea...if you have any interest, it would be great to get
Apache Beam set up on our Rackspace VM (with Spark?) and use it for
our regression tests?

-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]]
Sent: Friday, May 19, 2017 4:21 PM
To: [email protected]
Subject: Re: Extracting Text from embedded images in PDF docs

Hi Tim

Sure, once I get an initial PR ready I'll send an update and I'll
explain what I did for a start and we will discuss it further



--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/



--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Reply via email to