Hi Jan, Interesting. What why couldn't we just name the file the same thing, then? Would this be putting it up as a gamble to the Classloader?
Cheers, Chris On 8/22/10 8:40 AM, "Jan Høydahl / Cominvent" <[email protected]> wrote: > Hi, > > My rationale for the override part is as follows: > > The default properties file will be embedded within tika-xx.jar > I assume most people are not keen to unpack and repack JARs to make a config > change. > We COULD put a similar named properties file at another location, but then the > user > needs to make sure that location is EARLIER in classpath than the JAR file. > In the case of e.g. Solr (Tomcat, Jetty..) it is not obvious how to ensure > this, > and to avoid any confusion about class-loader peculiarities, it's more > straight-forward > to look for an override file. > > Take the Solr example. The user would then put the properties file along with > his new language profiles in a folder $SOLR_HOME/lib/org/apache/tika/language/ > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 22. aug. 2010, at 16.40, Chris A. Mattmann (JIRA) wrote: > >> >> [ >> https://issues.apache.org/jira/browse/TIKA-490?page=com.atlassian.jira.plugin >> .system.issuetabpanels:comment-tabpanel&focusedCommentId=12901170#action_1290 >> 1170 ] >> >> Chris A. Mattmann commented on TIKA-490: >> ---------------------------------------- >> >> Hi Jan, >> >> I don't get the point of the override properties part. What does it buy us? >> The way you've set it up, it also loads from the classpath just like the >> language identifier properties proper file, so, it shouldn't be any more >> arduous to just mod that file if necessary (since they both are classpath >> loaded). >> >> Let me know what you think. I've reviewed the rest of the patch it looks good >> and I'm ready to commit it, sans the override part. >> >> Cheers, >> Chris >> >> >>> Support for adding language profiles dynamically >>> ------------------------------------------------ >>> >>> Key: TIKA-490 >>> URL: https://issues.apache.org/jira/browse/TIKA-490 >>> Project: Tika >>> Issue Type: Improvement >>> Components: languageidentifier >>> Affects Versions: 0.7 >>> Reporter: Jan Høydahl >>> Assignee: Chris A. Mattmann >>> Fix For: 0.8 >>> >>> Attachments: TIKA-490.patch, TIKA-490.patch >>> >>> Original Estimate: 24h >>> Remaining Estimate: 24h >>> >>> Currently the Tika LanguageIdentifier loads language profiles thorugh a >>> hardcoded static block in the java code. >>> It would be better to make this configurable, so you could add your own >>> languages without recompiling. >>> Suggested approach: >>> Remove the static code block loading all languages. Instead look for a >>> tika.languageidentification.properties file on classpath. >>> Now the user can simply make his/her own (additional) language profile >>> files, put them on the classpath together with a properties file and off you >>> go! >>> Also, once you make it configurable, there might be an issue of having the >>> profiles as static members, as you will force the same behaviour for the >>> whole VM. A static Map of Maps could solve this. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
