Re: maven-indexer / Lucene

Martin Stockhammer Thu, 06 Jul 2017 02:48:55 -0700

We have different lucene (incompatible) dependencies that prevents us to update 
the maven indexer and/or jackrabbit. And this will happen again with each 
upgrade from one of these two packages in the future. 
So would be really good if we can find a solution that removes one of the 
lucene dependencies.


Greetings

Martin


Am 6. Juli 2017 09:36:06 MESZ schrieb Chris Graham <[email protected]>:
>Can I please an obvious/stupid question?
>
>What is driving this need for change?
>
>From a quick read of the thread above, all of the options appear to
>introduce a lot of breaking changes, and a whole lot more uncertainty.
>
>So, what is so broken that it is driving these changes?
>
>Sent from my iPhone
>
>> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <[email protected]> wrote:
>> 
>> Yup.
>> The idea is to have an extra jar produced by the maven-indexer with
>shaded
>> lucene version.
>> So the lucene classes (version used by Maven indexer) will be
>relocated in
>> a package called org.apache.maven.index.shaded.lucene (such
>> org.apache.maven.index.shaded.lucene.search.BooleanClause )
>> Then you exclude lucene dependencies used by maven indexer and voila.
>> The voila is a bit optimistic and not so ezy but anyway working on it
>ATM.
>> 
>> 
>>> On 6 July 2017 at 07:08, Martin <[email protected]> wrote:
>>> 
>>> What do you mean exactly by shading? Moving to another package name?
>>> 
>>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
>>>> maybe an option is to use some shading?
>>>> I'm thinking of shading lucene packages used by maven indexer. I
>can
>>> easily
>>>> provide a build for that.
>>>> WDYT?
>>>> 
>>>>> On 26 June 2017 at 11:49, Olivier Lamy <[email protected]> wrote:
>>>>> Hi
>>>>> graph/document storage could be convenient (but not possible with
>>> neo4j as
>>>>> it's GPL license [1])
>>>>> well we can add solr as an additional webapp with our jetty
>>> distribution
>>>>> but this will be a pain for users who want to use tomcat or any
>other
>>>>> servlet container...
>>>>> we still need to investigate a new storage model :-)
>>>>> 
>>>>> Olivier
>>>>> [1] https://neo4j.com/licensing/
>>>>> 
>>>>>> On 25 June 2017 at 06:26, Martin <[email protected]> wrote:
>>>>>> Yes, you are right. The lucene dependency causes a lot of trouble
>and
>>>>>> will
>>>>>> cause headaches with each version change of one of the
>dependencies.
>>>>>> What are the requirements for a replacement?
>>>>>> - We want to store hierarchical data?
>>>>>> - We want to store metadata for nodes ?
>>>>>> - Fulltext search (only metadata or for artifacts too?)
>>>>>> - Blob / Artifact storage (I don't think so, but not so familiar
>with
>>> the
>>>>>> archiva artifact model)?
>>>>>> 
>>>>>> Maybe some graph database may be an alternative. Don't know if
>the
>>>>>> license of
>>>>>> neo4j is compatible to the apache license, and I think it brings
>>> lucene
>>>>>> as
>>>>>> dependency too. I will have a look.
>>>>>> Problem is, if there is fulltext search needed, I think, for most
>of
>>> the
>>>>>> frameworks we get a lucene dependency, if it's embedded.
>>>>>> 
>>>>>> Other alternatives:
>>>>>> - Implement fulltext search by our own (index of the metadata
>stored
>>> via
>>>>>> the
>>>>>> archiva api) and use the lucene dependency that comes from the
>>>>>> maven-indexer
>>>>>> - Jcr Oak with Solr. Solr is not embedded, must run as its own
>>>>>> application
>>>>>> (war).
>>>>>> 
>>>>>> Greetings
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb Olivier Lamy:
>>>>>>> well this gonna be a pain.
>>>>>>> IMHO we need to find a new alternative to jcr oak.
>>>>>>> And something not using Lucene as it's a real pain to have
>different
>>>>>>> librairies using lucene as they do not update in the same time
>(and
>>>>>> 
>>>>>> Lucene
>>>>>> 
>>>>>>> break backward compat so quickly...)
>>>>>>> Any ideas? I'd like to have something embedded (but with a
>possible
>>>>>>> external server configuration).
>>>>>>> There is currently a Cassandra implementation. I was not
>satisfied
>>>>>>> about
>>>>>>> performance but I guess I did that 4yo ago so can be improved
>for
>>> sure
>>>>>> :
>>>>>> :-)
>>>>>> :
>>>>>>> Maybe orientdb?
>>>>>>> What else?
>>>>>>> 
>>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy <[email protected]>
>wrote:
>>>>>>>> well the issue is non compatible version of Lucene for Maven
>>> Indexer
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>>>> Oak (well I can try push a patch to Oak for upgrading...)
>>>>>>>> 
>>>>>>>>> On 24 June 2017 at 08:41, Olivier Lamy <[email protected]>
>wrote:
>>>>>>>>> Hi
>>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't need anymore plexus bridge.
>>>>>>>>> I'm working on it in the branch ( feature/jcr_oak )
>>>>>>>>> Not sure why but I have intermittent failure with store-jcr
>>> module.
>>>>>>>>> I definitely agree on the upgrade.
>>>>>>>>> Well we can simply detect it's not oak compatible and schedule
>a
>>>>>>>>> full
>>>>>>>>> reindex (maybe with a message in logs and ui?)
>>>>>>>>> But we need to be sure we can still read central index and not
>>> sure
>>>>>> 
>>>>>> about
>>>>>> 
>>>>>>>>> possible lucene conflict with oak and maven indexer.
>>>>>>>>> We can work on this branch? (I created a Jenkins job for it
>>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
>>>>>>>>> va-jcr-oak-branch/)
>>>>>>>>> If you prefer master I would say no worries neither.
>>>>>>>>> Something else to look at is upgrading maven-core etc...
>>>>>>>>> Anyway
>>>>>>>>> Cheers
>>>>>>>>> Olivier
>>>>>>>>> 
>>>>>>>>>> On 22 June 2017 at 19:16, Martin <[email protected]> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> upgrading the maven indexer leads to some major changes.
>>>>>>>>>> Lucene is used by maven-indexer and also by jackrabbit.
>>> Jackrabbit
>>>>>>>>>> sticks to
>>>>>>>>>> the old 3.x version and, as I see it, they will not move to a
>>> newer
>>>>>>>>>> version.
>>>>>>>>>> There is Jackrabbit Oak as alternative.
>>>>>>>>>> I tried a proof of concept and could replace the jackrabbit
>>>>>>>>>> implementation of
>>>>>>>>>> metadata-store-jcr with a oak implementation. At least I got
>the
>>>>>> 
>>>>>> unit
>>>>>> 
>>>>>>>>>> tests of
>>>>>>>>>> this module all to pass.
>>>>>>>>>> But switching to Oak has some drawbacks:
>>>>>>>>>> - The repository format changed and we must provide a way to
>>>>>>>>>> migrate
>>>>>>>>>> (either
>>>>>>>>>> migrate the existing repository or create a new one by
>>> reindexing)
>>>>>>>>>> - The lucene version used is newer but does not match to the
>>>>>>>>>> version
>>>>>>>>>> from the
>>>>>>>>>> maven-indexer dependencies. There may come up some
>>>>>>>>>> incompatibilities
>>>>>>>>>> that are
>>>>>>>>>> not solvable without using a modified version of one of the
>>> both.
>>>>>>>>>> Or
>>>>>>>>>> there may
>>>>>>>>>> be the possibility to switch to solr (as separate component)
>and
>>>>>> 
>>>>>> get rid
>>>>>> 
>>>>>>>>>> of
>>>>>>>>>> the lucene dependencies for jcr inside the archiva project.
>>>>>>>>>> 
>>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT means some changes
>too:
>>>>>>>>>> - The Plexus-Sisu-Bridge does not work as before.
>>>>>>>>>> - We must migrate from the NexusIndexer to the indexer API.
>>>>>>>>>> 
>>>>>>>>>> So switching to the new indexer and oak means more work as
>>> expected
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>>>>>> some
>>>>>>>>>> risks regarding new incompatibility problems. And I think
>this
>>>>>> 
>>>>>> cannot be
>>>>>> 
>>>>>>>>>> done
>>>>>>>>>> without broken master builds for some time period.
>>>>>>>>>> 
>>>>>>>>>> So, what should we do? I think maven indexer is one of the
>core
>>>>>>>>>> components of
>>>>>>>>>> archiva, and we should utilize the 3.x-version to  migrate to
>>> the
>>>>>> 
>>>>>> new
>>>>>> 
>>>>>>>>>> indexer
>>>>>>>>>> version, even if this means switching to jcr oak. Otherwise
>it
>>>>>>>>>> would
>>>>>>>>>> mean to
>>>>>>>>>> stick to the old version for the next years.
>>>>>>>>>> @Olivier, regarding the maven-indexer / sisu-Bridge API
>>> changes, I
>>>>>> 
>>>>>> hope
>>>>>> 
>>>>>>>>>> you
>>>>>>>>>> can provide  useful help.
>>>>>>>>>> 
>>>>>>>>>> I committed the PoC to the branch feature/jcr_oak. There are
>>> some
>>>>>>>>>> modules
>>>>>>>>>> where the tests do not pass (mainly because of the indexer
>API
>>>>>> 
>>>>>> changes).
>>>>>> 
>>>>>>>>>> Any comments?
>>>>>>>>>> 
>>>>>>>>>> Cheers
>>>>>>>>>> 
>>>>>>>>>> Martin
>>>>>>>>>> 
>>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35 CEST schrieb Olivier
>Lamy:
>>>>>>>>>>> forget it but we need to ensure we can read maven index
>>> files....
>>>>>>>>>>> 
>>>>>>>>>>> On 13 June 2017 at 17:06, Olivier Lamy <[email protected]>
>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> Remember jackrabbit depends on Lucene as well so upgrading
>>>>>> 
>>>>>> Lucene
>>>>>> 
>>>>>>>>>> can be a
>>>>>>>>>> 
>>>>>>>>>>>> problem here.
>>>>>>>>>>>> Regarding maven-indexer yes we can depend on a snapshot
>>> until
>>>>>> 
>>>>>> the
>>>>>> 
>>>>>>>>>> release.
>>>>>>>>>> 
>>>>>>>>>>>> I can release it ;-)
>>>>>>>>>>>> 
>>>>>>>>>>>> On 13 June 2017 at 06:06, Martin <[email protected]>
>>> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the lucene version depends on the maven indexer. But I'm
>>> not
>>>>>> 
>>>>>> sure
>>>>>> 
>>>>>>>>>> about
>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> current state of maven-indexer. The version has not
>changed
>>>>>> 
>>>>>> since
>>>>>> 
>>>>>>>>>> some
>>>>>>>>>> 
>>>>>>>>>>>>> 2013.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are commits on the master branch since then, and the
>>>>>> 
>>>>>> lucene
>>>>>> 
>>>>>>>>>> version
>>>>>>>>>> 
>>>>>>>>>>>>> has
>>>>>>>>>>>>> been changed too, but no releases were tagged.
>>>>>>>>>>>>> Does it make sense to switch to the maven-indexer
>>>>>>>>>>>>> 6.0-SNAPSHOT?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As I know there are new compact index formats with new
>>> lucene
>>>>>>>>>> 
>>>>>>>>>> versions
>>>>>>>>>> 
>>>>>>>>>>>>> but I'm
>>>>>>>>>>>>> not sure if this is relevant for the maven indexes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Martin
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Olivier Lamy
>>>>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Olivier Lamy
>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Olivier Lamy
>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>> 
>>>>> --
>>>>> Olivier Lamy
>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Olivier Lamy
>> http://twitter.com/olamy | http://linkedin.com/in/olamy

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

Re: maven-indexer / Lucene

Reply via email to