Awesome!

Any tips on building Beam?  Should it work on (dare I say) Windows?

Intellij is complaining that it can't find jdk.tools:jdk.tools:1.6 as a 
dependency under much of the Hadoop modules.

mvn clean install is failing at Beam::SDKS::Java::Core


[ERROR]   AvroIOTest.testWriteDisplayData:561
Expected: display data with item: (with key is "filePrefix" and with type is 
<STRING> and with value is "/foo")
     but: found 6 non-matching item(s):
<[]org.apache.beam.sdk.io.AvroIO$Write:codec=snappy
[]org.apache.beam.sdk.io.AvroIO$Write:schema=org.apache.beam.sdk.io.AvroIOTest$GenericClass
[]org.apache.beam.sdk.io.AvroIO$Write:fileSuffix=bar
[]org.apache.beam.sdk.io.AvroIO$Write:numShards=100
[]org.apache.beam.sdk.io.AvroIO$Write:shardNameTemplate=-SS-of-NN-
[]org.apache.beam.sdk.io.AvroIO$Write:filePrefix=C:\foo>
[ERROR]   
FileBasedSinkTest.testRemoveWithTempFilename:148->testRemoveTemporaryFiles:261 
temp file 
C:\Users\tallison\AppData\Local\Temp\junit5212433513605155196\temp\file0 exists
Expected: is <false>
     but: was <true>
[ERROR]   FileBasedSourceTest.testSplittingFailsOnEmptyFileExpansion
Expected: (an instance of java.io.FileNotFoundException and exception with 
message a string containing "No files found for spec: 
C:\Users\tallison\AppData\Local\Temp\junit1719865221821921346\junit7087025770573441186/missing.txt")
     but: an instance of java.io.FileNotFoundException 
<java.lang.IllegalStateException: Unable to find registrar for c> is a 
java.lang.IllegalStateException
Stacktrace was: java.lang.IllegalStateException: Unable to find registrar for c
        at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
        at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)


among many other errors...
-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]] 
Sent: Thursday, May 25, 2017 12:47 PM
To: Allison, Timothy B. <[email protected]>; [email protected]
Subject: Re: Integrating Tika with Apache Beam

Hi Guys

The link to the initial code is available in JIRA, at this stage the focus is 
on preparing a solid initial PR, and then we can all improve Tika related code 
:-)

Cheers, Sergey
On 24/05/17 11:41, Sergey Beryozkin wrote:
> Hi Tim, All,
> 
> I thought I'd start a dedicated thread.
> 
> I added some initial comments to [1], I'm quite close now to creating 
> the initial PR.
> 
> Thanks, Sergey
> 
> [1] https://issues.apache.org/jira/browse/BEAM-2328
> On 23/05/17 17:42, Allison, Timothy B. wrote:
>> Another idea...if you have any interest, it would be great to get 
>> Apache Beam set up on our Rackspace VM (with Spark?) and use it for 
>> our regression tests?
>>
>> -----Original Message-----
>> From: Sergey Beryozkin [mailto:[email protected]]
>> Sent: Friday, May 19, 2017 4:21 PM
>> To: [email protected]
>> Subject: Re: Extracting Text from embedded images in PDF docs
>>
>> Hi Tim
>>
>> Sure, once I get an initial PR ready I'll send an update and I'll 
>> explain what I did for a start and we will discuss it further
>>


--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Reply via email to