Re: [GSoC 2009] Add search capability to index/search artifacts in the SCA domain

Adriano Crestani Wed, 01 Apr 2009 01:27:50 -0700

Hi Phillipe,

very good and detailed proposal : )

In addition, with every artifact the indexed artifact is related to, an
extra information can be added using a Lucene feature called payload, this
information could tell what is the relationship between the elements.

I liked about this relationship thing, have you thought about extending
Lucene query parser so new syntax could be provided? We could extend and add
support to something like: isreferenced("StoreCatalog") ...so every
component that is referenced by StoreCatalog would be returned. Well, maybe
we could also do this using Lucene field, it would be much faster. Anyway,
there are cool features that could be done using payloads, we just need to
come up with some good ideas : )

To handle different file types, file analyzers will be implemented to
extract the texts from it. For example, a .class file is a binary file, but
the method names (mainly the ones annotated with SCA Annotations) could be
extracted using Java Reflection API. File analyzers could also call other
analyzers recursively, for example, an .composite file could be analyzed
using a CompositeAnalyzer and when it reaches the implementation.java node
it could invoke JavaClassAnalyzer and etc. This way each type of file will
have only its significant text indexed, otherwise, if the file is parsed
using a common text file analyzer, every search for "component" would find
every composite file, because it contains "<component>" node declaration.

This is really what I had in mind, do something that only extracts the
relevant information, because search is also about good results, it is not
as simple as just finding them, otherwise Google would not be so famous and
you probably would never be applying for GSoC : )...I think we should also
implement an analyzer for compressed files, there are many jars on a domain,
we cannot just ignore them.

Now, about the "searching" session of your proposal, it's fine, I think
Lucene already give us a good query parser for user input. It's a good idea
to implement everything as an SCA component, and one of the services it
could provide is to search not only using a query text, but also accepting
Lucene query objects as input. Some app using the search component could
have a very user friendly interface where the user could check many
checkboxes and other high level GUI component to refine a query, for this
cases, when the app execute the search it would probably generate the Lucene
objects directly instead of creating a query string.

The results will be displayed using a tree layout, something like Eclipse
IDE does [see image below] on its text search results, but instead of a tree
like project -> package -> class -> fragment text that contains the searched
text, it would be, for example, node > contribution > component >
file.componsite file > fragment text that contains the searched text. This
is just an example, the way the results can be displayed can still be
discussed on the community mailing list.
Hey, this is a good way to display a result, because in the results you can
already see the artifacts relationship. Maybe we could work on expanding the
result tree down to files inside compressed files or method inside class
files. I think this display model could be extended not only for displaying
results, but also to display every artifact on the domain manager web app.

I think you might want to double the "Implementing text and file analyzer
for indexing" phase time.

+1 from me too :)

Adriano Crestani

On Wed, Apr 1, 2009 at 12:02 AM, Phillipe Ramalho <
[email protected]> wrote:

> Thanks Luciano,
>
> You might start thinking on how you are going to integrate with the
> runtime, possibly the contribution processing as a new phase or a new
> type of processor ?
>
> OK, I will investigate more about that and add some details about this to
> my proposal. I will let every
> one knows when I update it.
>
> Best Regards,
> Phillipe Ramalho
>
> On Tue, Mar 31, 2009 at 10:29 AM, Luciano Resende <[email protected]>wrote:
>
>> On Tue, Mar 31, 2009 at 1:04 AM, Phillipe Ramalho
>> <[email protected]> wrote:
>> > Hi everyone,
>> >
>> > This is my proposal for the project "Add search capability to
>> index/search
>> > artifacts in the SCA domain" described at [1]. I already submitted the
>> > proposal at gsoc webpage and added it to Tuscany Wiki proposals at [2].
>> >
>> > Any critic, suggestion, comments, review will be appreciated.
>> >
>> > I think there are some good points that could be improved on the
>> proposal
>> > and I'm still working on that, mainly those points I say that should be
>> > discussed on the community, so, any comments about that will be also
>> > appreciated : )
>> >
>>
>>
>> Looks really good, and very detailed....
>>
>> You might start thinking on how you are going to integrate with the
>> runtime, possibly the contribution processing as a new phase or a new
>> type of processor ?
>>
>> Anyway, +1 from me.
>>
>> > Thanks in advance,
>> > Phillipe Ramalho
>> >
>>
>>
>>
>> --
>> Luciano Resende
>> Apache Tuscany, Apache PhotArk
>> http://people.apache.org/~lresende <http://people.apache.org/%7Elresende>
>> http://lresende.blogspot.com/
>>
>
>
>
> --
> Phillipe Ramalho
>

Re: [GSoC 2009] Add search capability to index/search artifacts in the SCA domain

Reply via email to