Hello, I'm sorry Preston. Here is the link for the image. https://drive.google.com/file/d/0B-2mdAzfAj07Z0w4RVZ2SGFfTFk/view?usp=sharing
I came up with this approach thinking that, the index should be updated automatically if any of the xml file has been changed. (Without user interference) And what I have added in the proposal was also updating the index automatically. I didn't saw the new issue which was added by Steven about it, https://issues.apache.org/jira/browse/VXQUERY-198. As Steven mentioned, the updating process should be decided where, only the changed files (updated, deleted or inserted) should be updated in the index. Is there anything else that we will eventually want in a metadata file? I think that as we are trying to track the modified files, a content based checksum is the best way to do it. We can use last modified date and check it. But it's not fully reliable method depending only on single factor which can also be changed based on the time of the user's machine. Other than checksum value, I think we can store some info about the relevant index of that file. So when updating the index, the process will be very easy. (I have to look whether it is possible) When you say run a query, is this a UPDATE query or a SEARCH query? I think at this point we only want to cause the update action to happen for a UPDATE query. The overhead of update a query before searching could be to much. Lets first get UPDATE working. I thought this should be run in a Search query. (As I was not fully aware of the update index query) So, my suggestion was, when running a search query, it will first check for any file changes. If there were any, update the corresponding index and do the search on it. It's true as you mentioned it will have a huge overhead. So we can use this method in detecting the changed files and update the index in update query. Thank you very much Menaka On 6 June 2016 at 03:02, Steven Jacobs <[email protected]> wrote: > In addition to Preston's comments, we also need to start thinking about the > Lucene side. Once we know a file needs to be changed in the index, how does > this change take place? Looking at how things are stored now will help with > this. > Steven > > On Sunday, June 5, 2016, Preston Carman <[email protected]> wrote: > > > As we consider creating a meta data file for each index, lets consider > > what other information could be stored with the index? What are the > > types of functionality do we need to have a complete indexing story? > > As I understand it, we support creating an index and searching using > > that index. Would we want to show the user a list of indexes? Menaka's > > e-mail suggest we need a way to update an index. What other > > queries/features should we support around indexes? > > > > Indexing Features > > * Create index > > * Search using index > > * Update index??? > > * List indexes??? > > * Delete index??? > > > > On Sat, Jun 4, 2016 at 10:18 PM, Menaka Madushanka > > <[email protected] <javascript:;>> wrote: > > > Hi everyone, > > > > > > I came up with an implementation plan for the $subject. This will be > > able to > > > detect file content changes as well as deletions and additions. > > > > > > Methodology: > > > 1. Generate checksum (MD5/ SHA) for each file. These checksum values > > will be > > > written to a single properties file in following format. > > > > > > path_to_the_file=checksum_string > > > > > > > Is there anything else that we will eventually want in a metadata file? > > > > > > > > 2.In the first time run, the checksum will be calculated and the > > properties > > > file will be created. > > > > > > > Sounds good. > > > > > 3. When running a query, > > > > > > The properties file will be read and loaded in to memory. > > > The checksum values will be checked for each file. > > > If any modification is detected, the index will be updated and the new > > > checksum value will be stored. > > > > > > In the process of checking the checksum, the path of the file will be > > taken > > > by the file itself and retrieve the checksum for that file from > > properties. > > > So, if any file insertion or deletion can be detected because we > consider > > > the actual file first. > > > > > > > When you say run a query, is this a UPDATE query or a SEARCH query? I > > think at this point we only want to cause the update action to happen > > for a UPDATE query. The overhead of update a query before searching > > could be to much. Lets first get UPDATE working. > > > > > To make the process more clear, I have attached the flow diagram > > herewith. > > > > > > > I do not see the diagram. Apache will only forward certain types of > > attachments. Can you post a link to your diagram? > > > > > I'd be very happy to have any feedback on this approach. > > > > > > Thank you very much > > > Menaka > > > > > > -- > > > Menaka Madushanka Jayawardena > > > Faculty of Engineering, > > > University of Peradeniyaya. > > > LinkedIn > > > TP:- 071 885 1183/ 071 350 5470 > > > -- *Menaka Madushanka Jayawardena* Faculty of Engineering, <http://www.pdn.ac.lk/eng> University of Peradeniyaya. LinkedIn <http://lk.linkedin.com/in/menakajayawardena> TP:- 071 885 1183/ 071 350 5470
