In addition to Preston's comments, we also need to start thinking about the Lucene side. Once we know a file needs to be changed in the index, how does this change take place? Looking at how things are stored now will help with this. Steven
On Sunday, June 5, 2016, Preston Carman <[email protected]> wrote: > As we consider creating a meta data file for each index, lets consider > what other information could be stored with the index? What are the > types of functionality do we need to have a complete indexing story? > As I understand it, we support creating an index and searching using > that index. Would we want to show the user a list of indexes? Menaka's > e-mail suggest we need a way to update an index. What other > queries/features should we support around indexes? > > Indexing Features > * Create index > * Search using index > * Update index??? > * List indexes??? > * Delete index??? > > On Sat, Jun 4, 2016 at 10:18 PM, Menaka Madushanka > <[email protected] <javascript:;>> wrote: > > Hi everyone, > > > > I came up with an implementation plan for the $subject. This will be > able to > > detect file content changes as well as deletions and additions. > > > > Methodology: > > 1. Generate checksum (MD5/ SHA) for each file. These checksum values > will be > > written to a single properties file in following format. > > > > path_to_the_file=checksum_string > > > > Is there anything else that we will eventually want in a metadata file? > > > > > 2.In the first time run, the checksum will be calculated and the > properties > > file will be created. > > > > Sounds good. > > > 3. When running a query, > > > > The properties file will be read and loaded in to memory. > > The checksum values will be checked for each file. > > If any modification is detected, the index will be updated and the new > > checksum value will be stored. > > > > In the process of checking the checksum, the path of the file will be > taken > > by the file itself and retrieve the checksum for that file from > properties. > > So, if any file insertion or deletion can be detected because we consider > > the actual file first. > > > > When you say run a query, is this a UPDATE query or a SEARCH query? I > think at this point we only want to cause the update action to happen > for a UPDATE query. The overhead of update a query before searching > could be to much. Lets first get UPDATE working. > > > To make the process more clear, I have attached the flow diagram > herewith. > > > > I do not see the diagram. Apache will only forward certain types of > attachments. Can you post a link to your diagram? > > > I'd be very happy to have any feedback on this approach. > > > > Thank you very much > > Menaka > > > > -- > > Menaka Madushanka Jayawardena > > Faculty of Engineering, > > University of Peradeniyaya. > > LinkedIn > > TP:- 071 885 1183/ 071 350 5470 >
