Ha....Beam doesn't work on Windows currently... https://issues.apache.org/jira/browse/BEAM-2299
-----Original Message----- From: Allison, Timothy B. [mailto:[email protected]] Sent: Thursday, May 25, 2017 2:30 PM To: Sergey Beryozkin <[email protected]>; [email protected] Subject: RE: Integrating Tika with Apache Beam Awesome! Any tips on building Beam? Should it work on (dare I say) Windows? Intellij is complaining that it can't find jdk.tools:jdk.tools:1.6 as a dependency under many of the Hadoop modules. mvn clean install is failing at Beam::SDKS::Java::Core [ERROR] AvroIOTest.testWriteDisplayData:561 Expected: display data with item: (with key is "filePrefix" and with type is <STRING> and with value is "/foo") but: found 6 non-matching item(s): <[]org.apache.beam.sdk.io.AvroIO$Write:codec=snappy []org.apache.beam.sdk.io.AvroIO$Write:schema=org.apache.beam.sdk.io.AvroIOTest$GenericClass []org.apache.beam.sdk.io.AvroIO$Write:fileSuffix=bar []org.apache.beam.sdk.io.AvroIO$Write:numShards=100 []org.apache.beam.sdk.io.AvroIO$Write:shardNameTemplate=-SS-of-NN- []org.apache.beam.sdk.io.AvroIO$Write:filePrefix=C:\foo> [ERROR] FileBasedSinkTest.testRemoveWithTempFilename:148->testRemoveTemporaryFiles:261 temp file C:\Users\tallison\AppData\Local\Temp\junit5212433513605155196\temp\file0 exists Expected: is <false> but: was <true> [ERROR] FileBasedSourceTest.testSplittingFailsOnEmptyFileExpansion Expected: (an instance of java.io.FileNotFoundException and exception with message a string containing "No files found for spec: C:\Users\tallison\AppData\Local\Temp\junit1719865221821921346\junit7087025770573441186/missing.txt") but: an instance of java.io.FileNotFoundException <java.lang.IllegalStateException: Unable to find registrar for c> is a java.lang.IllegalStateException Stacktrace was: java.lang.IllegalStateException: Unable to find registrar for c at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447) at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111) among many other errors... -----Original Message----- From: Sergey Beryozkin [mailto:[email protected]] Sent: Thursday, May 25, 2017 12:47 PM To: Allison, Timothy B. <[email protected]>; [email protected] Subject: Re: Integrating Tika with Apache Beam Hi Guys The link to the initial code is available in JIRA, at this stage the focus is on preparing a solid initial PR, and then we can all improve Tika related code :-) Cheers, Sergey On 24/05/17 11:41, Sergey Beryozkin wrote: > Hi Tim, All, > > I thought I'd start a dedicated thread. > > I added some initial comments to [1], I'm quite close now to creating > the initial PR. > > Thanks, Sergey > > [1] https://issues.apache.org/jira/browse/BEAM-2328 > On 23/05/17 17:42, Allison, Timothy B. wrote: >> Another idea...if you have any interest, it would be great to get >> Apache Beam set up on our Rackspace VM (with Spark?) and use it for >> our regression tests? >> >> -----Original Message----- >> From: Sergey Beryozkin [mailto:[email protected]] >> Sent: Friday, May 19, 2017 4:21 PM >> To: [email protected] >> Subject: Re: Extracting Text from embedded images in PDF docs >> >> Hi Tim >> >> Sure, once I get an initial PR ready I'll send an update and I'll >> explain what I did for a start and we will discuss it further >> -- Sergey Beryozkin Talend Community Coders http://coders.talend.com/
