Hi Tim
I just used 'mvn install -DskipTests=true' to quickly build it, and did
'mvn clean install' inside the tika module.
I use Eclipse, Beam docs on how to set up it are good, except that it
did not quite work for me yet for all of Beam, only managed to import
the individual Tika module
Cheers, Sergey
On 25/05/17 19:30, Allison, Timothy B. wrote:
Awesome!
Any tips on building Beam? Should it work on (dare I say) Windows?
Intellij is complaining that it can't find jdk.tools:jdk.tools:1.6 as a
dependency under much of the Hadoop modules.
mvn clean install is failing at Beam::SDKS::Java::Core
[ERROR] AvroIOTest.testWriteDisplayData:561
Expected: display data with item: (with key is "filePrefix" and with type is <STRING> and
with value is "/foo")
but: found 6 non-matching item(s):
<[]org.apache.beam.sdk.io.AvroIO$Write:codec=snappy
[]org.apache.beam.sdk.io.AvroIO$Write:schema=org.apache.beam.sdk.io.AvroIOTest$GenericClass
[]org.apache.beam.sdk.io.AvroIO$Write:fileSuffix=bar
[]org.apache.beam.sdk.io.AvroIO$Write:numShards=100
[]org.apache.beam.sdk.io.AvroIO$Write:shardNameTemplate=-SS-of-NN-
[]org.apache.beam.sdk.io.AvroIO$Write:filePrefix=C:\foo>
[ERROR]
FileBasedSinkTest.testRemoveWithTempFilename:148->testRemoveTemporaryFiles:261
temp file C:\Users\tallison\AppData\Local\Temp\junit5212433513605155196\temp\file0
exists
Expected: is <false>
but: was <true>
[ERROR] FileBasedSourceTest.testSplittingFailsOnEmptyFileExpansion
Expected: (an instance of java.io.FileNotFoundException and exception with message a
string containing "No files found for spec:
C:\Users\tallison\AppData\Local\Temp\junit1719865221821921346\junit7087025770573441186/missing.txt")
but: an instance of java.io.FileNotFoundException
<java.lang.IllegalStateException: Unable to find registrar for c> is a
java.lang.IllegalStateException
Stacktrace was: java.lang.IllegalStateException: Unable to find registrar for c
at
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
among many other errors...
-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]]
Sent: Thursday, May 25, 2017 12:47 PM
To: Allison, Timothy B. <[email protected]>; [email protected]
Subject: Re: Integrating Tika with Apache Beam
Hi Guys
The link to the initial code is available in JIRA, at this stage the focus is
on preparing a solid initial PR, and then we can all improve Tika related code
:-)
Cheers, Sergey
On 24/05/17 11:41, Sergey Beryozkin wrote:
Hi Tim, All,
I thought I'd start a dedicated thread.
I added some initial comments to [1], I'm quite close now to creating
the initial PR.
Thanks, Sergey
[1] https://issues.apache.org/jira/browse/BEAM-2328
On 23/05/17 17:42, Allison, Timothy B. wrote:
Another idea...if you have any interest, it would be great to get
Apache Beam set up on our Rackspace VM (with Spark?) and use it for
our regression tests?
-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]]
Sent: Friday, May 19, 2017 4:21 PM
To: [email protected]
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Tim
Sure, once I get an initial PR ready I'll send an update and I'll
explain what I did for a start and we will discuss it further
--
Sergey Beryozkin
Talend Community Coders
http://coders.talend.com/
--
Sergey Beryozkin
Talend Community Coders
http://coders.talend.com/