Hi, Wojtek.

It's great to hear your interest in this GSoC project. Your success in Tuscany CORBA binding project from GSoC 2008 is really encouraging.

Your understanding pretty much matches what I have in mind. A few more comments.

1) Indexing: I think indexing is probably not only just keyword based. It will involve the "QName" indexing of the artifacts (such as QName of java classes, QName of composites, WSDLs, XSDs, BPEL files). The runtime processing of SCA contributions can also benefit from this work. For example, the Tuscany already lazily load the WSDL/XSD files upon the need to resolve references by QName. We should apply the same strategy for composite files too.

2) The search can be based on keywords, structural URIs, QName of various artifacts, Policy settings, etc.

3) The search capability could be potentially integrated with the management of the SCA domain.

Thanks,
Raymond
--------------------------------------------------
From: "Wojtek Janiszewski" <[email protected]>
Sent: Monday, March 30, 2009 2:19 PM
To: <[email protected]>
Subject: [GSoC 2009] Search in SCA domain manager web app

Hi,
I'm interested in taking part in Google Summer of Code and project "tuscany-scadomain-search" [1] sounds interesting to me.

I've made a quick look inside domain manager web app and Apache Lucene and made few assumptions for a start. I defined three main areas which project should cover and they are indexing, searching and presentation. Having those areas separeted allows us to write modular code and test it.

1. Indexing

- Indexing should include all available contributions. File names as well as their contents (except non readable files like Java classes) should be indexed. Every indexed item should have link to its contribution parent.

- After adding, updating or deleting contribution from domain manager web application appropriate items should be reindexed.

- We may also consider having connections between indexed items, ie. we could scan composite files to acquire children names and build reversed links, so every indexed item (script, Java class etc.) could have connection to its composite parents.

2. Searching

- Search feature would be accessible via SCA domain manager web application. It should allow to:
-- simply search for files by name
-- search files content
-- filter - search inside specified contribution or composite

- Maybe we should consider candies like Ajax hints while typing search phrase?

- More research one Apache Lucene could provide more searching ideas.

3. Presentation

- Each search result should be presented using name and link to contribution which it belongs to. If it's viewable (it's not Java class etc) then simple preview feature for such item should be enabled. Obviously matched text should be highlighted (as Google does).

- If information about composite parents for this items would be accessible then such composites also should be listed.


This quick draft is direction which I'll take while creating proposal. It appears to be interesting project, especially it allows to explore new areas (everything beyond bindings in Tuscany, Lucene). There is still much place to improve (like other features) so any comments are welcome.

Thanks,
Wojtek

[1] - http://wiki.apache.org/general/SummerOfCode2009#tuscany-scadomain-search

Reply via email to