I’m not so keen on fundamentally changing the organization of Tika until 2.x. This seems like a major change to me in the way people expect to consume Tika.
Can we: 1. Release a 1.11 that doesn’t include these types of changes 2. After 1.11, change trunk to be 2.0-SNAPSHOT and work those types of issues there? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Yaniv Kunda <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, September 23, 2015 at 9:30 AM To: "[email protected]" <[email protected]> Subject: Re: [DISCUSS] Release Tika 1.11? >+1 for the uber jar! > >Regarding jdk7 issues, I have a few more I will create and patch later >tonight - I'll post a list of issues as well. >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <[email protected]> wrote: > >> Tim, was your check for File#getName done manually or it's present in >>tests >> somehow? If it's present in tests we can check it on major platforms (I >>can >> test on linux, win xp and maybe on macosx) with different jdks. >> >> In case commons-io doesn't support ':' as file separator we can have >>simple >> utility class in Tika or send them a patch for it. >> >> I think, we can rethink Tika packaging in 1.11/1.12 and produce these >> artifacts: >> - tika-core w/ dependency on commons-io (and deprecate most of >>o.a.tika.io >> , >> forwarding calls to jdk or commons-io), >> - tika-core-uber w/ shaded commons-io (rename and drop all things >> unnecessary for o.a.tika.io), >> - sliced tika-parsers-* as Bob suggested earlier, >> - tika-parsers jar w/ all tika-parsers-* parts (for compatibility), >> - other tika-* artifacts (like tika-server, tika-app etc). >> >> One who needs tika-core without dependencies would use tika-core-uber >> instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein >>will >> depend on tika-core. >> And we can drop o.a.tika.io in 2.0. >> >> Also, I'll take a look at unresolved jdk7 issues/patches today. >> >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <[email protected]>: >> >> > Thank _you_ for all of your work in modernizing us. With your >>efforts, >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :) >> > >> > >>Regarding FilenameUtils.getName() - I believe that its functionality >> can >> > be replaced by Path.getFileName() - and in a platform-aware manner, as >> each >> > JVM distribution comes with a specific provider implementation for >>the OS >> > it's for. >> > >> > I agree that we should use that anytime we're interacting with the >>file >> > system. >> > >> > However, that's actually the problem for paths that are stored within >>the >> > document (say, an embedded resource). Let's say a user creates a >>file on >> > Windows, the file path information for the embedded file (depending on >> the >> > parser and the file format) may be in Windows-ese, which is a >>problem if >> > you try to use Path.getFileName() (I think... I haven't actually >>tested >> > this) on a Linux machine. I have actually tested this with the old >>File >> > getName(), and it did not work cross-platform IIRC. >> > >> > In short, Tika needs to have the ability to extract the file name >>from a >> > path that was created on any platform (including old Mac and its ":" >> > separator) while Tika is running on any platform. >> > >> > -----Original Message----- >> > From: Yaniv Kunda [mailto:[email protected]] >> > Sent: Monday, September 21, 2015 11:31 AM >> > To: [email protected] >> > Subject: RE: [DISCUSS] Release Tika 1.11? >> > >> > Thanks for the positive spirit! >> > >> > Regarding FilenameUtils.getName() - I believe that its functionality >>can >> > be replaced by Path.getFileName() - and in a platform-aware manner, as >> each >> > JVM distribution comes with a specific provider implementation for >>the OS >> > it's for. >> > >> > -----Original Message----- >> > From: Allison, Timothy B. [mailto:[email protected]] >> > Sent: Monday, September 21, 2015 14:27 >> > To: [email protected] >> > Subject: RE: [DISCUSS] Release Tika 1.11? >> > >> > +1, it would be great to move a bit more into EOL'd Java 7 asap. >> > >> > I'll take TIKA-1734 by tomorrow EDT. >> > >> > As for the other 2, I'm personally ok waiting for 1.12, but I defer to >> the >> > dev community. >> > >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on >> > TIKA-1726, that might help move things forward. >> > >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also share >> Yaniv's >> > point about duplication of code, bloat within Tika and missing out on >> > updates. Aside from one small bit of code I'd like to keep or >>perhaps >> try >> > to move into commons-io (?)[0], I think I'm now +1 to going forward >>with >> > TIKA-1706 in core...unless there is a -1 from the community. >> > >> > Best, >> > >> > Tim >> > >> > >> > [1] I added some customizations for old MAC OS behavior (treat ":" as >> file >> > separator) in FileNameUtils.getName() that I don't want to lose. >> > >> > >> > -----Original Message----- >> > From: Yaniv Kunda [mailto:[email protected]] >> > Sent: Sunday, September 20, 2015 7:15 AM >> > To: [email protected] >> > Subject: RE: [DISCUSS] Release Tika 1.11? >> > >> > I would really like to push the following: >> > >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back >>commons-io >> > to tika-core This requires a decision to re-include commons-io as a >> > dependency of tika-core. >> > All the pros and cons have been already debated, but no decision has >>been >> > made. >> > >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment public >>methods >> > that use a java.io.File with methods that use a java.nio.file.Path >>Since >> > this adds new methods to the public API, I requested the group to >>make a >> > decision about the new names - but have not received something >>definite. >> > However, I did create a subtask - >> > https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path >> in >> > TemporaryResources - using [~tallison]'s suggestion, which has not >>been >> > committed yet. >> > >> > If decisions are made on the above issues, I can quickly create >>patches >> > for them. >> > >> > -----Original Message----- >> > From: Mattmann, Chris A (3980) [mailto:[email protected]] >> > Sent: Saturday, September 19, 2015 08:10 >> > To: [email protected] >> > Subject: [DISCUSS] Release Tika 1.11? >> > >> > Hey Guys and Gals, >> > >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in >>particular >> > allows some neat functionality in tika-python: >> > https://github.com/chrismattmann/tika-python/pull/67 >> > >> > >> > Anything else to try and get into the release? >> > >> > If not, I’ll produce an RC #1 by end of weekend. >> > >> > Cheers, >> > Chris >> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > Chris Mattmann, Ph.D. >> > Chief Architect >> > Instrument Software and Science Data Systems Section (398) NASA Jet >> > Propulsion Laboratory Pasadena, CA 91109 USA >> > Office: 168-519, Mailstop: 168-527 >> > Email: [email protected] >> > WWW: http://sunset.usc.edu/~mattmann/ >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > Adjunct Associate Professor, Computer Science Department University of >> > Southern California, Los Angeles, CA 90089 USA >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > >> > -- >> > >> > >> > This email communication (including any attachments) contains >>information >> > from Answers Corporation or its affiliates that is confidential and >>may >> be >> > privileged. The information contained herein is intended only for the >>use >> > of the addressee(s) named above. If you are not the intended recipient >> (or >> > the agent responsible to deliver it to the intended recipient), you >>are >> > hereby notified that any dissemination, distribution, use, or copying >>of >> > this communication is strictly prohibited. If you have received this >> email >> > in error, please immediately reply to sender, delete the message and >> > destroy all copies of it. If you have questions, please email >> > [email protected]. >> > >> > If you wish to unsubscribe to commercial emails from Answers and its >> > affiliates, please go to the Answers Subscription Center >> > http://campaigns.answers.com/subscriptions to opt out. Thank you. >> > >> > -- >> > >> > >> > This email communication (including any attachments) contains >>information >> > from Answers Corporation or its affiliates that is confidential and >>may >> be >> > privileged. The information contained herein is intended only for the >>use >> > of the addressee(s) named above. If you are not the intended recipient >> (or >> > the agent responsible to deliver it to the intended recipient), you >>are >> > hereby notified that any dissemination, distribution, use, or copying >>of >> > this communication is strictly prohibited. If you have received this >> email >> > in error, please immediately reply to sender, delete the message and >> > destroy all copies of it. If you have questions, please email >> > [email protected]. >> > >> > If you wish to unsubscribe to commercial emails from Answers and its >> > affiliates, please go to the Answers Subscription Center >> > http://campaigns.answers.com/subscriptions to opt out. Thank you. >> > >> -- >> Best regards, >> Konstantin Gribov >> > >-- > > >This email communication (including any attachments) contains information >from Answers Corporation or its affiliates that is confidential and may >be >privileged. The information contained herein is intended only for the use >of the addressee(s) named above. If you are not the intended recipient >(or >the agent responsible to deliver it to the intended recipient), you are >hereby notified that any dissemination, distribution, use, or copying of >this communication is strictly prohibited. If you have received this >email >in error, please immediately reply to sender, delete the message and >destroy all copies of it. If you have questions, please email >[email protected]. > >If you wish to unsubscribe to commercial emails from Answers and its >affiliates, please go to the Answers Subscription Center >http://campaigns.answers.com/subscriptions to opt out. Thank you.
