It might also be helpful to look at the AsterixDB external data and indexing paper in CIKM'15 for inspiration...? On Jun 5, 2016 11:11 AM, "Preston Carman" <[email protected]> wrote:
> As we consider creating a meta data file for each index, lets consider > what other information could be stored with the index? What are the > types of functionality do we need to have a complete indexing story? > As I understand it, we support creating an index and searching using > that index. Would we want to show the user a list of indexes? Menaka's > e-mail suggest we need a way to update an index. What other > queries/features should we support around indexes? > > Indexing Features > * Create index > * Search using index > * Update index??? > * List indexes??? > * Delete index??? > > On Sat, Jun 4, 2016 at 10:18 PM, Menaka Madushanka > <[email protected]> wrote: > > Hi everyone, > > > > I came up with an implementation plan for the $subject. This will be > able to > > detect file content changes as well as deletions and additions. > > > > Methodology: > > 1. Generate checksum (MD5/ SHA) for each file. These checksum values > will be > > written to a single properties file in following format. > > > > path_to_the_file=checksum_string > > > > Is there anything else that we will eventually want in a metadata file? > > > > > 2.In the first time run, the checksum will be calculated and the > properties > > file will be created. > > > > Sounds good. > > > 3. When running a query, > > > > The properties file will be read and loaded in to memory. > > The checksum values will be checked for each file. > > If any modification is detected, the index will be updated and the new > > checksum value will be stored. > > > > In the process of checking the checksum, the path of the file will be > taken > > by the file itself and retrieve the checksum for that file from > properties. > > So, if any file insertion or deletion can be detected because we consider > > the actual file first. > > > > When you say run a query, is this a UPDATE query or a SEARCH query? I > think at this point we only want to cause the update action to happen > for a UPDATE query. The overhead of update a query before searching > could be to much. Lets first get UPDATE working. > > > To make the process more clear, I have attached the flow diagram > herewith. > > > > I do not see the diagram. Apache will only forward certain types of > attachments. Can you post a link to your diagram? > > > I'd be very happy to have any feedback on this approach. > > > > Thank you very much > > Menaka > > > > -- > > Menaka Madushanka Jayawardena > > Faculty of Engineering, > > University of Peradeniyaya. > > LinkedIn > > TP:- 071 885 1183/ 071 350 5470 >
