+1 for the branching strategy. With respect to slicing up the parsers it would be great to have more discussion on how the parsers should be organized. I think Tim has a draft out on this mailing list that would benefit from some additional perspectives. Really cool to be talking about doing this!
- Bob On Wed, Sep 23, 2015 at 12:36 PM, Konstantin Gribov <[email protected]> wrote: > It seems to be a good idea to avoid inclusion of commons-io into tika-core > till 2.0 if we will release it in several months. > In this case we will have trunk w/ ongoing development of 2.0-SNAPSHOT and > branch for 1.11+ bugfixes. > > Some changes related to java7 can be included to 1.11/1.12 with no > problems. > > ср, 23 сент. 2015 г. в 19:33, Mattmann, Chris A (3980) < > [email protected]>: > > > I’m not so keen on fundamentally changing the organization of > > Tika until 2.x. This seems like a major change to me in the > > way people expect to consume Tika. > > > > Can we: > > > > 1. Release a 1.11 that doesn’t include these types of changes > > 2. After 1.11, change trunk to be 2.0-SNAPSHOT and work those > > types of issues there? > > > > Cheers, > > Chris > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Chief Architect > > Instrument Software and Science Data Systems Section (398) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 168-519, Mailstop: 168-527 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > -----Original Message----- > > From: Yaniv Kunda <[email protected]> > > Reply-To: "[email protected]" <[email protected]> > > Date: Wednesday, September 23, 2015 at 9:30 AM > > To: "[email protected]" <[email protected]> > > Subject: Re: [DISCUSS] Release Tika 1.11? > > > > >+1 for the uber jar! > > > > > >Regarding jdk7 issues, I have a few more I will create and patch later > > >tonight - I'll post a list of issues as well. > > >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <[email protected]> wrote: > > > > > >> Tim, was your check for File#getName done manually or it's present in > > >>tests > > >> somehow? If it's present in tests we can check it on major platforms > (I > > >>can > > >> test on linux, win xp and maybe on macosx) with different jdks. > > >> > > >> In case commons-io doesn't support ':' as file separator we can have > > >>simple > > >> utility class in Tika or send them a patch for it. > > >> > > >> I think, we can rethink Tika packaging in 1.11/1.12 and produce these > > >> artifacts: > > >> - tika-core w/ dependency on commons-io (and deprecate most of > > >>o.a.tika.io > > >> , > > >> forwarding calls to jdk or commons-io), > > >> - tika-core-uber w/ shaded commons-io (rename and drop all things > > >> unnecessary for o.a.tika.io), > > >> - sliced tika-parsers-* as Bob suggested earlier, > > >> - tika-parsers jar w/ all tika-parsers-* parts (for compatibility), > > >> - other tika-* artifacts (like tika-server, tika-app etc). > > >> > > >> One who needs tika-core without dependencies would use tika-core-uber > > >> instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein > > >>will > > >> depend on tika-core. > > >> And we can drop o.a.tika.io in 2.0. > > >> > > >> Also, I'll take a look at unresolved jdk7 issues/patches today. > > >> > > >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <[email protected] > >: > > >> > > >> > Thank _you_ for all of your work in modernizing us. With your > > >>efforts, > > >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. > :) > > >> > > > >> > >>Regarding FilenameUtils.getName() - I believe that its > functionality > > >> can > > >> > be replaced by Path.getFileName() - and in a platform-aware manner, > as > > >> each > > >> > JVM distribution comes with a specific provider implementation for > > >>the OS > > >> > it's for. > > >> > > > >> > I agree that we should use that anytime we're interacting with the > > >>file > > >> > system. > > >> > > > >> > However, that's actually the problem for paths that are stored > within > > >>the > > >> > document (say, an embedded resource). Let's say a user creates a > > >>file on > > >> > Windows, the file path information for the embedded file (depending > on > > >> the > > >> > parser and the file format) may be in Windows-ese, which is a > > >>problem if > > >> > you try to use Path.getFileName() (I think... I haven't actually > > >>tested > > >> > this) on a Linux machine. I have actually tested this with the old > > >>File > > >> > getName(), and it did not work cross-platform IIRC. > > >> > > > >> > In short, Tika needs to have the ability to extract the file name > > >>from a > > >> > path that was created on any platform (including old Mac and its ":" > > >> > separator) while Tika is running on any platform. > > >> > > > >> > -----Original Message----- > > >> > From: Yaniv Kunda [mailto:[email protected]] > > >> > Sent: Monday, September 21, 2015 11:31 AM > > >> > To: [email protected] > > >> > Subject: RE: [DISCUSS] Release Tika 1.11? > > >> > > > >> > Thanks for the positive spirit! > > >> > > > >> > Regarding FilenameUtils.getName() - I believe that its functionality > > >>can > > >> > be replaced by Path.getFileName() - and in a platform-aware manner, > as > > >> each > > >> > JVM distribution comes with a specific provider implementation for > > >>the OS > > >> > it's for. > > >> > > > >> > -----Original Message----- > > >> > From: Allison, Timothy B. [mailto:[email protected]] > > >> > Sent: Monday, September 21, 2015 14:27 > > >> > To: [email protected] > > >> > Subject: RE: [DISCUSS] Release Tika 1.11? > > >> > > > >> > +1, it would be great to move a bit more into EOL'd Java 7 asap. > > >> > > > >> > I'll take TIKA-1734 by tomorrow EDT. > > >> > > > >> > As for the other 2, I'm personally ok waiting for 1.12, but I defer > to > > >> the > > >> > dev community. > > >> > > > >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in > on > > >> > TIKA-1726, that might help move things forward. > > >> > > > >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also share > > >> Yaniv's > > >> > point about duplication of code, bloat within Tika and missing out > on > > >> > updates. Aside from one small bit of code I'd like to keep or > > >>perhaps > > >> try > > >> > to move into commons-io (?)[0], I think I'm now +1 to going forward > > >>with > > >> > TIKA-1706 in core...unless there is a -1 from the community. > > >> > > > >> > Best, > > >> > > > >> > Tim > > >> > > > >> > > > >> > [1] I added some customizations for old MAC OS behavior (treat ":" > as > > >> file > > >> > separator) in FileNameUtils.getName() that I don't want to lose. > > >> > > > >> > > > >> > -----Original Message----- > > >> > From: Yaniv Kunda [mailto:[email protected]] > > >> > Sent: Sunday, September 20, 2015 7:15 AM > > >> > To: [email protected] > > >> > Subject: RE: [DISCUSS] Release Tika 1.11? > > >> > > > >> > I would really like to push the following: > > >> > > > >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back > > >>commons-io > > >> > to tika-core This requires a decision to re-include commons-io as a > > >> > dependency of tika-core. > > >> > All the pros and cons have been already debated, but no decision has > > >>been > > >> > made. > > >> > > > >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment public > > >>methods > > >> > that use a java.io.File with methods that use a java.nio.file.Path > > >>Since > > >> > this adds new methods to the public API, I requested the group to > > >>make a > > >> > decision about the new names - but have not received something > > >>definite. > > >> > However, I did create a subtask - > > >> > https://issues.apache.org/jira/browse/TIKA-1734 Use > > java.nio.file.Path > > >> in > > >> > TemporaryResources - using [~tallison]'s suggestion, which has not > > >>been > > >> > committed yet. > > >> > > > >> > If decisions are made on the above issues, I can quickly create > > >>patches > > >> > for them. > > >> > > > >> > -----Original Message----- > > >> > From: Mattmann, Chris A (3980) [mailto: > [email protected]] > > >> > Sent: Saturday, September 19, 2015 08:10 > > >> > To: [email protected] > > >> > Subject: [DISCUSS] Release Tika 1.11? > > >> > > > >> > Hey Guys and Gals, > > >> > > > >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in > > >>particular > > >> > allows some neat functionality in tika-python: > > >> > https://github.com/chrismattmann/tika-python/pull/67 > > >> > > > >> > > > >> > Anything else to try and get into the release? > > >> > > > >> > If not, I’ll produce an RC #1 by end of weekend. > > >> > > > >> > Cheers, > > >> > Chris > > >> > > > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> > Chris Mattmann, Ph.D. > > >> > Chief Architect > > >> > Instrument Software and Science Data Systems Section (398) NASA Jet > > >> > Propulsion Laboratory Pasadena, CA 91109 USA > > >> > Office: 168-519, Mailstop: 168-527 > > >> > Email: [email protected] > > >> > WWW: http://sunset.usc.edu/~mattmann/ > > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> > Adjunct Associate Professor, Computer Science Department University > of > > >> > Southern California, Los Angeles, CA 90089 USA > > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> > > > >> > -- > > >> > > > >> > > > >> > This email communication (including any attachments) contains > > >>information > > >> > from Answers Corporation or its affiliates that is confidential and > > >>may > > >> be > > >> > privileged. The information contained herein is intended only for > the > > >>use > > >> > of the addressee(s) named above. If you are not the intended > recipient > > >> (or > > >> > the agent responsible to deliver it to the intended recipient), you > > >>are > > >> > hereby notified that any dissemination, distribution, use, or > copying > > >>of > > >> > this communication is strictly prohibited. If you have received this > > >> email > > >> > in error, please immediately reply to sender, delete the message and > > >> > destroy all copies of it. If you have questions, please email > > >> > [email protected]. > > >> > > > >> > If you wish to unsubscribe to commercial emails from Answers and its > > >> > affiliates, please go to the Answers Subscription Center > > >> > http://campaigns.answers.com/subscriptions to opt out. Thank you. > > >> > > > >> > -- > > >> > > > >> > > > >> > This email communication (including any attachments) contains > > >>information > > >> > from Answers Corporation or its affiliates that is confidential and > > >>may > > >> be > > >> > privileged. The information contained herein is intended only for > the > > >>use > > >> > of the addressee(s) named above. If you are not the intended > recipient > > >> (or > > >> > the agent responsible to deliver it to the intended recipient), you > > >>are > > >> > hereby notified that any dissemination, distribution, use, or > copying > > >>of > > >> > this communication is strictly prohibited. If you have received this > > >> email > > >> > in error, please immediately reply to sender, delete the message and > > >> > destroy all copies of it. If you have questions, please email > > >> > [email protected]. > > >> > > > >> > If you wish to unsubscribe to commercial emails from Answers and its > > >> > affiliates, please go to the Answers Subscription Center > > >> > http://campaigns.answers.com/subscriptions to opt out. Thank you. > > >> > > > >> -- > > >> Best regards, > > >> Konstantin Gribov > > >> > > > > > >-- > > > > > > > > >This email communication (including any attachments) contains > information > > >from Answers Corporation or its affiliates that is confidential and may > > >be > > >privileged. The information contained herein is intended only for the > use > > >of the addressee(s) named above. If you are not the intended recipient > > >(or > > >the agent responsible to deliver it to the intended recipient), you are > > >hereby notified that any dissemination, distribution, use, or copying of > > >this communication is strictly prohibited. If you have received this > > >email > > >in error, please immediately reply to sender, delete the message and > > >destroy all copies of it. If you have questions, please email > > >[email protected]. > > > > > >If you wish to unsubscribe to commercial emails from Answers and its > > >affiliates, please go to the Answers Subscription Center > > >http://campaigns.answers.com/subscriptions to opt out. Thank you. > > > > -- > Best regards, > Konstantin Gribov >
