Le 27 juil. 05 à 19:28, Quentin Mathé a écrit :
Le 24 juil. 05 à 09:48, Yen-Ju Chen a écrit :
I am writing headers for high-level interface of Lucene.
It is intended to bridge GNUstep and Lucene and probably mimic OSX
spotlight.
Since Etoile would be the most likely user for these interface,
I would like to have some opinions on them.
You can check the Etoile/Frameworks/LuceneKit/Headers/
LCIndexManager.h
and Etoile/Frameworks/LuceneKit/Headers/Importer/LCImporter.h
I add some document for each method.
Basically, LCImporter is just like spotlight plugin, which is
quite simply.
LCIndexManager is like NSFileManager.
You can add, search, remove files.
You can specify which directory/files to index or not, and which
importers to use.
Importers know what type of files they can handle.
So you can just add all the importers you have in LCIndexManager.
They will figure out which one(s) to use.
ok
The currently model is based on file system.
I don't know whether there is an abstract or virtual system in Etoile
which treat file system, memory and database in the universal way.
If so, I would like to adapt it.
There will be, but it isn't fully specified, look at
ExtendedWorkspaceKit and CoreObject pages to have some ideas (related
descriptions are still fuzzy I admit ;-)… I have added extra
explanations about EXPersistentObject (virtual file system's unit/
object used by Étoilé) in a new page about Indexing here : <http://
www.dromasoftware.com/etoile/mediawiki/index.php?
title=FogVision_Indexing>
About CoreObject : <http://www.dromasoftware.com/etoile/mediawiki/
index.php?title=CoreObject>
About ExtendedWorkspaceKit : <http://www.dromasoftware.com/etoile/
mediawiki/index.php?title=ExtendedWorkspaceKit>
I intend to make it more general so that people can use it in their
own application
without the existence of Etoile workspace.
But it will definitely depend on some of the Etoile frameworks.
Comment are very welcome.
It is ealier to change the interface before I really work on the
implementation.
But if there is no further comment, I will start to implement them
soon.
My comments are coming soon. Well some changes will probably
necessary to integrate nicely with ExtendedWorkspaceKit,
responsabilities between LuceneKit and ExtendedWorkspaceKit may
need to be shared in a way a bit different to your current solution.
Hi,
Here is my reply… I have written a specification for Indexing support
in Étoilé with ExtendedWorkspaceKit, you can read it : <http://
www.dromasoftware.com/etoile/mediawiki/index.php?
title=FogVision_Indexing>
Now some comments on current LuceneKit high-level API…
About LCImporter :
@protocol LCImporter <NSObject>
- (BOOL) metadataForFile: (NSString *) path type: (NSString *) type
attributes: (NSMutableDictionary *) attributes;
I would prefer to methods like:
- (NSDictionary *) valuesForURL: (NSURL *)url attributes: (NSArray *)
keys error: (NSError **)err
- (NSDictionary *) valuesForPath: (NSString *)path attributes:
(NSArray *)keys error: (NSError **)err
Note: within GNUstep/Cocoa, File is more often used to reference
FileHandle than Path.
Methods like below which returns a unique value may be welcome for
performance when we need to update an indexed item, I don't know ?…
- (id) valueForURL: (NSURL *)url attribute: (NSString *)keys error:
(NSError **)err
- (id) valueForPath: (NSString *)path attribute: (NSString *)keys
error: (NSError **)err
- (NSArray *) types;
This method should returns UTIs I think, we would need a custom class
like EXUTI.
/** Each kind of data has a key attribute for identification.
* The value for this key attribute must to be unique.
* For example, the keyAttribute of a file is path.
* For contact, it must be something unique.
* This attribute must exists in the attributes return by -
metadataForFile:type:attributes:
*/
- (NSString *) keyAttribute;
Why not call it "identifier" or "uniqueIdentifer" ? … because every
attribute tends to be a key in an index, it doesn't sound very clear
terminology imho.
About LCIndexManager :
@interface LCIndexManager: LCIndexModifier
+ (BOOL) indexExistsAt: (NSString *) path;
We should have:
+ (BOOL) indexExistsAtURL: (NSURL *) url;
+ (BOOL) indexExistsAtPath: (NSString *) path;
/** Initiate an index data in memory */
- (id) init;
ok
- (id) initWithPath: (NSString *) path create: (BOOL) create;
We should have:
- (id) initWithURL: (NSURL *) url create: (BOOL) create;
- (id) initWithPath: (NSString *) path create: (BOOL) create;
/** Use Lucene LCDirectory as virtual file system */
- (id) initWithDirectory: (id <LCDirectory>) directory create:
(BOOL) create;
ok
/** Add path for indexing.
* This is not stored in index, therefore, for each new LCIndexManager,
* it must be set before use LCIndexManager.
* Otherwise, nothing will be indexed.
* If path is a directory, it will index everything within this
directory and its subdirectory.
*/
- (void) addIndexPath: (NSString *) path;
- (void) setIndexPaths: (NSArray *) paths;
- (NSArray *) indexPaths;
We should have:
- (void) addIndexURL: (NSURL *) url;
- (void) addIndexPath: (NSString *) path;
- (void) setIndexURLs: (NSArray *) urls;
- (void) setIndexPaths: (NSArray *) paths;
- (NSArray *) indexURLs;
- (NSArray *) indexPaths;
/** Add importer for indexing.
* Each file (item) in path will be indexed by each importer.
* If two importers use the same file type,
* each file will be indexed twice.
* It is the responsibility of importer to know which type of file
it should handle.
*/
- (void) addImporter: (id <LCImporter>) importer;
- (void) setImporters: (NSArray *) importers;
- (NSArray *) importers;
ok
/** Specify importers for a given path.
* This override the general rules of indexing above.
* Only the specified importers will be used for path and its
subdirectory (if sub is YES).
* If importers is nil, path will not be indexed.
*/
- (void) setIndexPath: (NSString *) path importers: (NSArray *)
importers includeSubpaths: (BOOL) sub;
We should have an URL equivalent, which means includeSubpaths: may
need to replaced… perhaps deep: could be used, I'm currently using it
within ExtendedWorkspaceKit like in - (void) indexPersistentObject:
(EXPersistentObject *)obj deep: (BOOL)indexElements
Any other suggestions ?
/** index everything under -indexPaths.
* It search all the existed document under -indexPaths, remove
them, and add them back.
* Warn: it cost a lot. It do search and delete, add index if
necessary.
*/
- (void) indexAllFiles;
ok
/** index new.
* It compare the value of updateAttributes to determine whether a
file should be indexed.
* Warn: it cost a lot. It do search, compare and delete, add, index
if necessary.
*/
- (void) indexUpdatedFiles;
ok
/** Index one file
* This should be used most commonly within any application while an
file (item) changed.
* It is basically a combination of -setIndexPaths: and -indexAll.
* Since there is only one file at a time, the cost is less.
*/
- (void) indexFileAtPath: (NSString *) path;
- (void) indexFileAtPath: (NSString *) path importer: (id
<LCImporter>) importer;
I'm not sure theses methods are really adequate, I would suggest:
- (void) indexItem: (id <LCIndexableItem>)item // Allow to index
objects which aren't Files or are in memory
- (void) indexAtURL: (NSURL *) url;
- (void) indexAtPath: (NSString *) path;
- (void) indexAtURL: (NSString *) path importer: (id <LCImporter>)
importer;
- (void) indexAtPath: (NSString *) path importer: (id <LCImporter>)
importer;
LCIndexableItem protocol could be:
- (NSDictionary *) valuesForAttributes: (NSString *)keys error:
(NSError **)err
- (NSString *) keyAttribute; // or -uniqueIdentifier
IndexableItem advanced support (like automatic update) would be
provided with ExtendedWorkspaceKit.
LCImporter would then be an LCIndexableItem derived protocol which
adds extra methods :
- (NSArray *) types;
- (NSDictionary *) valuesForURL: (NSURL *)url attributes: (NSArray *)
keys error: (NSError **)err
- (NSDictionary *) valuesForPath: (NSString *)path attributes:
(NSArray *)keys error: (NSError **)err
/** Remove the document at path.
* It do search, delete.
*/
- (void) removeFileAtPath: (NSString *) path;
What is the purpose of thie method ? You mean something like -
removeIndexPath: ?
/* Search */
/** Return search result based on query.
* Return an array of values at keyAttribute.
*/
- (NSArray *) searchWithString: (NSString *) query;
/** Return search result based on query. */
- (NSArray *) searchWithQuery: (LCQuery *) query;
ok (I still need to write FogVision Searching specification where I
will describe FogVision/ExtendedWorkspaceKit search API :-)
/* Advanced funcation */
- (LCIndexReader *) indexReader;
- (LCIndexWriter *) indexWriter;
ok
/** Return LCDocument at path */
- (LCDocument *) documentAtPath: (NSString *) path;
We should have -documentAtURL: too.
/** default analyzer */
- (void) setAnalyzer: (LCAnalyzer *) analyzer;
- (LCAnalyzer *) analyzer;
- (id <LCDirectory>) directory;
ok
About LCMetadataAttribute:
static NSString *LCMetadataChangeDateAttribute =
@"LCMetadataChangeDateAttribute";
static NSString *LCContentCreationDateAttribute =
@"LCContentCreationDateAttribute";
static NSString *LCContentModificationDateAttribute =
@"LCContentModificationDateAttribute";
static NSString *LCContentTypeAttribute = @"LCContentTypeAttribute";
static NSString *LCCreatorAttribute = @"LCCreatorAttribute";
static NSString *LCEmailAddressAttribute = @"LCEmailAddressAttribute";
static NSString *LCIdentifierAttribute = @"LCIdentifierAttribute";
static NSString *LCPathAttribute = @"LCPathAttribute";
/* These attributes will NOT be stored in index data */
static NSString *LCTextContentAttribute = @"LCTextContentAttribute";
In my opinion, it isn't Lucene job to define which keys have to be
used with indexes, that should be worked out at ExtendedWorkspaceKit
level.
I think a reverse dns scheme a la Spotlight would fit very well, it
could be wrapped with shorcut key like EXCreationDataMetadata (I
outline it in FogVision Indexing spec under Store in Lucene Index
section). Importers would need to be provided with an xml "schema" to
describe their own keys (possible reference : <http://
developer.apple.com/documentation/Carbon/Conceptual/MDImporters/
index.html>).
That's all ;-)
Thanks,
Quentin.
--
Quentin Mathé
[EMAIL PROTECTED]