I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that project 
loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions of 
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
        cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency on cdm, 
which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from Guava's 
cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError: 
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
        at 
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
        at org.apache.hadoop.io.compress.CodecPool.<clinit>(CodecPool.java:74)
        at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
        at 
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will need to 
play games with the class loader to get the older version of Guava - though 
that can cause other issues if Hadoop (or Cascading, etc) rely on anything 
that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 years 
ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track this.


> From: Tyler Palsulich
> Sent: April 13, 2015 10:56:29am PDT
> To: dev@tika.apache.org, u...@tika.apache.org
> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
> 
> Hi Folks,
> 
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
> 
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
> 
> The SHA1 checksum of the archive is
>   5e22fee9079370398472e59082d171ae2d7fdd31.
> 
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1009
> 
> Please vote on releasing this package as Apache Tika 1.8. The vote is open 
> for the next 72 hours and passes if a majority of at least three +1 Tika PMC 
> votes are cast.
> 
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
> 
> Thanks,
> Tyler


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to