Le 31 juil. 05 à 22:28, Yen-Ju Chen a écrit :

Most of them took fine for me.
I will implement the -xxxAtPath: first and see.

ok

On 7/31/05, Quentin Mathé <[EMAIL PROTECTED]> wrote:

Le 27 juil. 05 à 19:28, Quentin Mathé a écrit :

There will be, but it isn't fully specified, look at
ExtendedWorkspaceKit and CoreObject pages to have some ideas (related
descriptions are still fuzzy I admit ;-)… I have added extra
explanations about EXPersistentObject (virtual file system's unit/
object used by Étoilé) in a new page about Indexing here : <http://
www.dromasoftware.com/etoile/mediawiki/index.php?
title=FogVision_Indexing>

About CoreObject : <http://www.dromasoftware.com/etoile/mediawiki/
index.php?title=CoreObject>
About ExtendedWorkspaceKit : <http://www.dromasoftware.com/etoile/
mediawiki/index.php?title=ExtendedWorkspaceKit>

  The problem is that they are not implemented yet. :D

hehe it could be true.

  I would like to see that LuceneKit can be embedded in other
applications without ExtendedWorkspaceKit and CoreObject.
Therefore, I prefer that LuceneKit can support all these workspace features,
  but not depend on them.

Yes, I agree, however ExtendedWorkspaceKit shouldn't depend on LuceneKit too (because it might be desirable to replace our LuceneKit backend in the future). It should very easy too for GNUstep application using LuceneKit in their own way to become Étoilé native by relying on ExtendedWorkspaceKit (hat explains LuceneKit API tweaks I requested).

- (NSArray *) types;


This method should returns UTIs I think, we would need a custom class
like EXUTI.

  I agree, but we need a specification.

You mean UTI specification or UTI related class specification ?
For UTI specification :
Overview : <http://developer.apple.com/documentation/Carbon/ Conceptual/understanding_utis/index.html> ADC Article : <http://developer.apple.com/macosx/ uniformtypeidentifiers.html>

Is there somebody to write such class ? :-)

/** Each kind of data has a key attribute for identification.
* The value for this key attribute must to be unique.
* For example, the keyAttribute of a file is path.
* For contact, it must be something unique.
* This attribute must exists in the attributes return by -
metadataForFile:type:attributes:
*/
- (NSString *) keyAttribute;


Why not call it "identifier" or "uniqueIdentifer"  ? … because every
attribute tends to be a key in an index, it doesn't sound very clear
terminology imho.

  -identifier looks good.

ok, I'm fine with it.

 /** Add path for indexing.

* This is not stored in index, therefore, for each new LCIndexManager,
* it must be set before use LCIndexManager.
* Otherwise, nothing will be indexed.
* If path is a directory, it will index everything within this
directory and its subdirectory.
*/
- (void) addIndexPath: (NSString *) path;
- (void) setIndexPaths: (NSArray *) paths;
- (NSArray *) indexPaths;

We should have:
- (void) addIndexURL: (NSURL *) url;
- (void) addIndexPath: (NSString *) path;
- (void) setIndexURLs: (NSArray *) urls;
- (void) setIndexPaths: (NSArray *) paths;
- (NSArray *) indexURLs;
- (NSArray *) indexPaths;


  Good, but path first. :)

ok. I'm insisting on URL support because Apple is moving away from path to URL for Cocoa more and more. Moreover ExtendedWorkspaceKit will be URL centric too.


/** Specify importers for a given path.
* This override the general rules of indexing above.
* Only the specified importers will be used for path and its
subdirectory (if sub is YES).
* If importers is nil, path will not be indexed.
*/
- (void) setIndexPath: (NSString *) path importers: (NSArray *)
importers includeSubpaths: (BOOL) sub;


We should have an URL equivalent, which means includeSubpaths: may
need to replaced… perhaps deep: could be used, I'm currently using it
within ExtendedWorkspaceKit like in - (void) indexPersistentObject:
(EXPersistentObject *)obj deep: (BOOL)indexElements
Any other suggestions ?


  it looks fine to me, or since we will have -setIndexPath: and
-setIndexPersistentObject:, we can use subpaths and deep for each of
them.

I'm not sure to understand what you mean here.

/** Index one file
* This should be used most commonly within any application while an
file (item) changed.
* It is basically a combination of -setIndexPaths: and -indexAll.
* Since there is only one file at a time, the cost is less.
*/
- (void) indexFileAtPath: (NSString *) path;
- (void) indexFileAtPath: (NSString *) path importer: (id
<LCImporter>) importer;

I'm not sure theses methods are really adequate, I would suggest:

- (void) indexItem: (id <LCIndexableItem>)item // Allow to index
objects which aren't Files or are in memory
- (void) indexAtPath: (NSString *) path;
- (void) indexAtPath: (NSString *) path importer: (id <LCImporter>)
importer;

  Looks good.

ok

LCIndexableItem protocol could be:

- (NSDictionary *) valuesForAttributes: (NSString *)keys error:
(NSError **)err
- (NSString *) keyAttribute; // or -uniqueIdentifier

IndexableItem advanced support (like automatic update) would be
provided with ExtendedWorkspaceKit.

LCImporter would then be an LCIndexableItem derived protocol which
adds extra methods :

- (NSArray *) types;
- (NSDictionary *) valuesForURL: (NSURL *)url attributes: (NSArray *)
keys error: (NSError **)err
- (NSDictionary *) valuesForPath: (NSString *)path attributes:
(NSArray *)keys error: (NSError **)err


  This looks fine, but it's not very useful
  because LCIndexManager doesn't know which attributes to use.
  LCIndexManager have no idea about the file, LCImporter knows.
  LCIndexManager simple pass every file (object) to LCImporter,
  retrive the metadata (NSDictionary) and put them into index data.

Sorry, I forgot one method in LCIndexableItem :
- (NSDictionary *) attributes // This method name is may be a bit too much common
- (NSDictionary *) valuesForAttributes: (NSArray *)keys error:
(NSError **)err
- (NSString *) keyAttribute; // or -uniqueIdentifier

LCIndexableItem object doesn't have to be a file, it could be a any object (it may not have an UTI associated), then we would index it by requesting its attributes, then values for retrieved attributes. IndexableItem could a Mail object in a Mail application (with mails stored in in single database for each account, to take an example). It wouldn't be mandatory to have in a separate bundle too unlike importers. LCImporter would more specialized by needing to be bound to one or several UTIs (aka Types) for each element it can handle, then it would be tailored to handle files and always provided in a bundle to be run without dependency on related application(s).

I hope I'm more clear now… ?

/** Remove the document at path.
* It do search, delete.
*/
- (void) removeFileAtPath: (NSString *) path;


What is the purpose of thie method ? You mean something like -
removeIndexPath: ?

  The idea is to remove a metadata of a file in index data.
  Otherwise, if you delete a file in file system, its metadata will
still in index data
  and shows up after searching.

ok, but we would need its URL clone ;-)
It would better to have it implement as an NSFileManager category in my opinion, unless you decide to remove only metadatas in index and not the file itself in one pass with this method. Methods to remove indexed items which aren't files would be welcome too… but may be there is already a simple and obvious API in Lucene for this point. Like -removeItemWithIdentifier: (NSString *)keyAttribute

About LCMetadataAttribute:

static NSString *LCMetadataChangeDateAttribute =
@"LCMetadataChangeDateAttribute";

/* These attributes will NOT be stored in index data */
static NSString *LCTextContentAttribute = @"LCTextContentAttribute";

In my opinion, it isn't Lucene job to define which keys have to be
used with indexes, that should be worked out at ExtendedWorkspaceKit
level.

Although it's not LuceneKit's job, I think it's better to have a common ground
  so that each application can adapt similar scheme.
  It doesn't matter much, but I still prefer to provide some basic
metadata attribute.

ok, understood.

I think a reverse dns scheme a la Spotlight would fit very well, it
could be wrapped with shorcut key like EXCreationDataMetadata (I
outline it in FogVision Indexing spec under Store in Lucene Index
section). Importers would need to be provided with an xml "schema" to
describe their own keys (possible reference : <http://
developer.apple.com/documentation/Carbon/Conceptual/MDImporters/
index.html>).

  I am not fond of xml schema very much.
  It just add some extra files in the applications to maintain.
  Although it might be easier form programer (editing xml file instead
of recompilation),
  the funcation are idenfical to me.

Right, well no opinion now… :-)
We could use a NeXT-like plist to avoid xml though, but I'm not really sure it is needed.

There are two issues in my mind:

  1. The responsibility of deciding whether the metadata of a file
need to be update.
    For example, a file in file system has a modification date,
    which can be used to decide whether its metadata should be update.
    This modification date is stored in index data.
My original design is that importer is responsible for determining whether
    the metadata should be updated.
    So if [LCIndexManager -indexUpdatedFile] is called,
    LCIndexManager will iterate all files to importers.
    Each importer will decide which file should be update based on its
modification date
    and the modification date in the index data.
    In this case, my proposal for importer:
     - (BOOL) metadataForFile: (NSString *) path type: (NSString *)
type attributes: (NSMutableDictionary *) attributes
    will return NO if there is no need to update metadata.
    It will also return NO if importer cannot process this file even
type matches.
    The other advantage is that attributes may contains some data from
other importers already.
    Therefore, a file can pass through many importers.
    That's the reason to use NSMutableDictionary.

your proposal will have no such information for LCIndexManager to know
    whether the metadata is the same as old one.
    - (NSDictionary *) valuesForPath: (NSString *)path attributes:
(NSArray *)keys error: (NSError **)err

 We could use methods like :
- (BOOL) prepareIndexForFile: (NSString *) path type: (NSString *) type attributes: (NSMutableDictionary *) attributes
or with error parameter…
- (BOOL) prepareIndexForFile: (NSString *) path type: (NSString *) type attributes: (NSMutableDictionary *) attributes error: (NSError **)err

provideIndexForFile:blabla: could be preferred to prepareIndexForFile:blabla: for example.

That would mean similar methods for LCIndexableItem.

I understand, but I'm not sure it is the right choice to let importers to decided on updating or not, because you would be unable to force reindexing without modification date changes… Moreover it should be noted, when Étoilé environment will be in charge of indexing, it won't necessary to check the modification date, because requests will have been triggerred by FS monitoring. I have no idea if checking modification date in Lucene indexes would involve a real overhead for many files, then it may need to be disabled when 'wkindexd' dameon is managing index updates.

  2. IndexableItem vs Importer.
      Now, importer is like a filter. every file (object) pass through
it and attributes are retrived.
      To me, IndexableItem look like a wrap of each file (object),
which know their attributes.
      Therefore, if there are 1,000 text files, there will be 1,000
IndexableItem but 1 importer to text.
Although both of them will work, I think importer is more efficient.

Exactly, IndexableItem should be used when it is not possible to use an importer, and importer should be preferred for performance and by being always usable by automatic system indexing without related application(s) launched.

Thanks,
Quentin.

--
Quentin Mathé
[EMAIL PROTECTED]


Reply via email to