Re: Automatically updating Index

Till Westmann Tue, 07 Jun 2016 06:42:44 -0700

Hi,

we should also consider how the issue of data/index consistency istackledin AsterixDB [1]. It doesn’t automatically update indexes, but itensures

consistency and thus allows the optimizer to choose an index without
changing the result of the query.

The approach might not be the right now for VXQuery, but it would begood to

take a look :)


Cheers,
Till

[1] http://dl.acm.org/citation.cfm?id=2806428

On 6 Jun 2016, at 20:52, Menaka Madushanka wrote:

Hello,

I'm sorry Preston. Here is the link for the image.
https://drive.google.com/file/d/0B-2mdAzfAj07Z0w4RVZ2SGFfTFk/view?usp=sharing
I came up with this approach thinking that, the index should beupdated
automatically if any of the xml file has been changed. (Without user
interference) And what I have added in the proposal was also updatingthe
index automatically.
I didn't saw the new issue which was added by Steven about it,
https://issues.apache.org/jira/browse/VXQUERY-198.
As Steven mentioned, the updating process should be decided where,only the
changed files (updated, deleted or inserted) should be updated in the
index.
Is there anything else that we will eventually want in a metadatafile?
I think that as we are trying to track the modified files, a contentbasedchecksum is the best way to do it. We can use last modified date andcheck
it. But it's not fully reliable method depending only on single factor
which can also be changed based on the time of the user's machine.

Other than checksum value, I think we can store some info about the
relevant index of that file. So when updating the index, the processwill
be very easy. (I have to look whether it is possible)

When you say run a query, is this a UPDATE query or a SEARCH query? I
think at this point we only want to cause the update action to happen
for a UPDATE query. The overhead of update a query before searching
could be to much. Lets first get UPDATE working.
I thought this should be run in a Search query. (As I was not fullyawareof the update index query) So, my suggestion was, when running asearchquery, it will first check for any file changes. If there were any,updatethe corresponding index and do the search on it. It's true as youmentionedit will have a huge overhead. So we can use this method in detectingthe
changed files and update the index in update query.

Thank you very much
Menaka


On 6 June 2016 at 03:02, Steven Jacobs <[email protected]> wrote:
In addition to Preston's comments, we also need to start thinkingabout theLucene side. Once we know a file needs to be changed in the index,how doesthis change take place? Looking at how things are stored now willhelp with
this.
Steven

On Sunday, June 5, 2016, Preston Carman <[email protected]> wrote:
As we consider creating a meta data file for each index, letsconsider
what other information could be stored with the index? What are the
types of functionality do we need to have a complete indexing story?
As I understand it, we support creating an index and searching using
that index. Would we want to show the user a list of indexes?Menaka's
e-mail suggest we need a way to update an index. What other
queries/features should we support around indexes?

Indexing Features
 * Create index
 * Search using index
 * Update index???
 * List indexes???
 * Delete index???

On Sat, Jun 4, 2016 at 10:18 PM, Menaka Madushanka
<[email protected] <javascript:;>> wrote:
Hi everyone,
I came up with an implementation plan for the $subject. This willbe
able to
detect file content changes as well as deletions and additions.

Methodology:
1. Generate checksum (MD5/ SHA) for each file. These checksumvalues
will be
written to a single properties file in following format.

path_to_the_file=checksum_string
Is there anything else that we will eventually want in a metadatafile?
2.In the first time run,  the checksum will be calculated and the
properties
file will be created.
Sounds good.
3. When running a query,

The properties file will be read and loaded in to memory.
The checksum values will be checked for each file.
If any modification is detected, the index will be updated and thenew
checksum value will be stored.
In the process of checking the checksum, the path of the file willbe
taken
by the file itself and retrieve the checksum for that file from
properties.
So, if any file insertion or deletion can be detected because we
consider
the actual file first.
When you say run a query, is this a UPDATE query or a SEARCH query?Ithink at this point we only want to cause the update action tohappen
for a UPDATE query. The overhead of update a query before searching
could be to much. Lets first get UPDATE working.
To make the process more clear, I have attached the flow diagram
herewith.
I do not see the diagram. Apache will only forward certain types of
attachments. Can you post a link to your diagram?
I'd be very happy to have any feedback on this approach.

Thank you very much
Menaka

--
Menaka Madushanka Jayawardena
Faculty of Engineering,
University of Peradeniyaya.
LinkedIn
TP:- 071 885 1183/ 071 350 5470
--
*Menaka Madushanka Jayawardena*
Faculty of Engineering, <http://www.pdn.ac.lk/eng>
University of Peradeniyaya.
LinkedIn <http://lk.linkedin.com/in/menakajayawardena>
TP:- 071 885 1183/ 071 350 5470

Re: Automatically updating Index

Reply via email to