Le 27 juil. 05 à 19:28, Quentin Mathé a écrit :

Le 24 juil. 05 à 09:48, Yen-Ju Chen a écrit :

I am writing headers for high-level interface of Lucene.
It is intended to bridge GNUstep and Lucene and probably mimic OSX spotlight.
Since Etoile would be the most likely user for these interface,
I would like to have some opinions on them.
You can check the Etoile/Frameworks/LuceneKit/Headers/ LCIndexManager.h
and Etoile/Frameworks/LuceneKit/Headers/Importer/LCImporter.h
I add some document for each method.

Basically, LCImporter is just like spotlight plugin, which is quite simply.
LCIndexManager is like NSFileManager.
You can add, search, remove files.
You can specify which directory/files to index or not, and which
importers to use.
Importers know what type of files they can handle.
So you can just add all the importers you have in LCIndexManager.
They will figure out which one(s) to use.

ok

The currently model is based on file system.
I don't know whether there is an abstract or virtual system in Etoile
which treat file system, memory and database in the universal way.
If so, I would like to adapt it.

There will be, but it isn't fully specified, look at ExtendedWorkspaceKit and CoreObject pages to have some ideas (related descriptions are still fuzzy I admit ;-)… I have added extra explanations about EXPersistentObject (virtual file system's unit/ object used by Étoilé) in a new page about Indexing here : <http:// www.dromasoftware.com/etoile/mediawiki/index.php? title=FogVision_Indexing>

About CoreObject : <http://www.dromasoftware.com/etoile/mediawiki/ index.php?title=CoreObject> About ExtendedWorkspaceKit : <http://www.dromasoftware.com/etoile/ mediawiki/index.php?title=ExtendedWorkspaceKit>

I intend to make it more general so that people can use it in their
own application
without the existence of Etoile workspace.
But it will definitely depend on some of the Etoile frameworks.

Comment are very welcome.
It is ealier to change the interface before I really work on the implementation. But if there is no further comment, I will start to implement them soon.


My comments are coming soon. Well some changes will probably necessary to integrate nicely with ExtendedWorkspaceKit, responsabilities between LuceneKit and ExtendedWorkspaceKit may need to be shared in a way a bit different to your current solution.

Hi,

Here is my reply… I have written a specification for Indexing support in Étoilé with ExtendedWorkspaceKit, you can read it : <http:// www.dromasoftware.com/etoile/mediawiki/index.php? title=FogVision_Indexing>

Now some comments on current LuceneKit high-level API…

About LCImporter :

@protocol LCImporter <NSObject>

- (BOOL) metadataForFile: (NSString *) path type: (NSString *) type
              attributes: (NSMutableDictionary *) attributes;

I would prefer to methods like:
- (NSDictionary *) valuesForURL: (NSURL *)url attributes: (NSArray *) keys error: (NSError **)err - (NSDictionary *) valuesForPath: (NSString *)path attributes: (NSArray *)keys error: (NSError **)err

Note: within GNUstep/Cocoa, File is more often used to reference FileHandle than Path.

Methods like below which returns a unique value may be welcome for performance when we need to update an indexed item, I don't know ?… - (id) valueForURL: (NSURL *)url attribute: (NSString *)keys error: (NSError **)err - (id) valueForPath: (NSString *)path attribute: (NSString *)keys error: (NSError **)err

- (NSArray *) types;

This method should returns UTIs I think, we would need a custom class like EXUTI.

/** Each kind of data has a key attribute for identification.
* The value for this key attribute must to be unique.
* For example, the keyAttribute of a file is path.
* For contact, it must be something unique.
* This attribute must exists in the attributes return by - metadataForFile:type:attributes:
*/
- (NSString *) keyAttribute;

Why not call it "identifier" or "uniqueIdentifer" ? … because every attribute tends to be a key in an index, it doesn't sound very clear terminology imho.

About LCIndexManager :

@interface LCIndexManager: LCIndexModifier

+ (BOOL) indexExistsAt: (NSString *) path;

We should have:
+ (BOOL) indexExistsAtURL: (NSURL *) url;
+ (BOOL) indexExistsAtPath: (NSString *) path;

/** Initiate an index data in memory */
- (id) init;

ok

- (id) initWithPath: (NSString *) path create: (BOOL) create;

We should have:
- (id) initWithURL: (NSURL *) url create: (BOOL) create;
- (id) initWithPath: (NSString *) path create: (BOOL) create;

/** Use Lucene LCDirectory as virtual file system */
- (id) initWithDirectory: (id <LCDirectory>) directory create: (BOOL) create;

ok

/** Add path for indexing.
* This is not stored in index, therefore, for each new LCIndexManager,
* it must be set before use LCIndexManager.
* Otherwise, nothing will be indexed.
* If path is a directory, it will index everything within this directory and its subdirectory.
*/
- (void) addIndexPath: (NSString *) path;
- (void) setIndexPaths: (NSArray *) paths;
- (NSArray *) indexPaths;

We should have:
- (void) addIndexURL: (NSURL *) url;
- (void) addIndexPath: (NSString *) path;
- (void) setIndexURLs: (NSArray *) urls;
- (void) setIndexPaths: (NSArray *) paths;
- (NSArray *) indexURLs;
- (NSArray *) indexPaths;


/** Add importer for indexing.
* Each file (item) in path will be indexed by each importer.
* If two importers use the same file type,
* each file will be indexed twice.
* It is the responsibility of importer to know which type of file it should handle.
*/
- (void) addImporter: (id <LCImporter>) importer;
- (void) setImporters: (NSArray *) importers;
- (NSArray *) importers;

ok

/** Specify importers for a given path.
* This override the general rules of indexing above.
* Only the specified importers will be used for path and its subdirectory (if sub is YES).
* If importers is nil, path will not be indexed.
*/
- (void) setIndexPath: (NSString *) path importers: (NSArray *) importers includeSubpaths: (BOOL) sub;

We should have an URL equivalent, which means includeSubpaths: may need to replaced… perhaps deep: could be used, I'm currently using it within ExtendedWorkspaceKit like in - (void) indexPersistentObject: (EXPersistentObject *)obj deep: (BOOL)indexElements
Any other suggestions ?

/** index everything under -indexPaths.
* It search all the existed document under -indexPaths, remove them, and add them back. * Warn: it cost a lot. It do search and delete, add index if necessary.
*/
- (void) indexAllFiles;

ok

/** index new.
* It compare the value of updateAttributes to determine whether a file should be indexed. * Warn: it cost a lot. It do search, compare and delete, add, index if necessary.
*/
- (void) indexUpdatedFiles;

ok

/** Index one file
* This should be used most commonly within any application while an file (item) changed.
* It is basically a combination of -setIndexPaths: and -indexAll.
* Since there is only one file at a time, the cost is less.
*/
- (void) indexFileAtPath: (NSString *) path;
- (void) indexFileAtPath: (NSString *) path importer: (id <LCImporter>) importer;

I'm not sure theses methods are really adequate, I would suggest:

- (void) indexItem: (id <LCIndexableItem>)item // Allow to index objects which aren't Files or are in memory
- (void) indexAtURL: (NSURL *) url;
- (void) indexAtPath: (NSString *) path;
- (void) indexAtURL: (NSString *) path importer: (id <LCImporter>) importer; - (void) indexAtPath: (NSString *) path importer: (id <LCImporter>) importer;

LCIndexableItem protocol could be:

- (NSDictionary *) valuesForAttributes: (NSString *)keys error: (NSError **)err
- (NSString *) keyAttribute; // or -uniqueIdentifier

IndexableItem advanced support (like automatic update) would be provided with ExtendedWorkspaceKit.

LCImporter would then be an LCIndexableItem derived protocol which adds extra methods :

- (NSArray *) types;
- (NSDictionary *) valuesForURL: (NSURL *)url attributes: (NSArray *) keys error: (NSError **)err - (NSDictionary *) valuesForPath: (NSString *)path attributes: (NSArray *)keys error: (NSError **)err

/** Remove the document at path.
* It do search, delete.
*/
- (void) removeFileAtPath: (NSString *) path;

What is the purpose of thie method ? You mean something like - removeIndexPath: ?

/* Search */
/** Return search result based on query.
* Return an array of values at keyAttribute.
*/
- (NSArray *) searchWithString: (NSString *) query;
/** Return search result based on query. */
- (NSArray *) searchWithQuery: (LCQuery *) query;

ok (I still need to write FogVision Searching specification where I will describe FogVision/ExtendedWorkspaceKit search API :-)

/* Advanced funcation */
- (LCIndexReader *) indexReader;
- (LCIndexWriter *) indexWriter;

ok

/** Return LCDocument at path */
- (LCDocument *) documentAtPath: (NSString *) path;

We should have -documentAtURL: too.

/** default analyzer */
- (void) setAnalyzer: (LCAnalyzer *) analyzer;
- (LCAnalyzer *) analyzer;


- (id <LCDirectory>) directory;

ok

About LCMetadataAttribute:

static NSString *LCMetadataChangeDateAttribute = @"LCMetadataChangeDateAttribute"; static NSString *LCContentCreationDateAttribute = @"LCContentCreationDateAttribute"; static NSString *LCContentModificationDateAttribute = @"LCContentModificationDateAttribute";
static NSString *LCContentTypeAttribute = @"LCContentTypeAttribute";
static NSString *LCCreatorAttribute = @"LCCreatorAttribute";
static NSString *LCEmailAddressAttribute = @"LCEmailAddressAttribute";
static NSString *LCIdentifierAttribute = @"LCIdentifierAttribute";
static NSString *LCPathAttribute = @"LCPathAttribute";

/* These attributes will NOT be stored in index data */
static NSString *LCTextContentAttribute = @"LCTextContentAttribute";

In my opinion, it isn't Lucene job to define which keys have to be used with indexes, that should be worked out at ExtendedWorkspaceKit level. I think a reverse dns scheme a la Spotlight would fit very well, it could be wrapped with shorcut key like EXCreationDataMetadata (I outline it in FogVision Indexing spec under Store in Lucene Index section). Importers would need to be provided with an xml "schema" to describe their own keys (possible reference : <http:// developer.apple.com/documentation/Carbon/Conceptual/MDImporters/ index.html>).

That's all ;-)

Thanks,
Quentin.

--
Quentin Mathé
[EMAIL PROTECTED]


Reply via email to