On Mon, May 11, 2009 at 10:42 AM, Ondrej Bilka <[email protected]> wrote: > On Mon, May 11, 2009 at 10:39:11AM +0200, James Youngman wrote: >> On Mon, May 11, 2009 at 8:14 AM, Ondrej Bilka <[email protected]> wrote: >> > Hello, >> > Was signature method discused here? search gives only pgp. >> > Its that you give to each letter bit create signature. >> > When searching you first create signature from largest continuous part not >> > containing /. Then you can compare signature of each directory and >> > filename. >> > There is risk it will be slowdown because you need to read more. >> >> I'm sorry, I did not understand what you were trying to express. >> Would you try again with an example? > Sure. Say you have files aaa ,baba, abcd and aaccee. Signatures are > 10000, 11000, 11110 and 10101
I don't understand the mapping you are implying here. > If we want find *ba*, it coresponds to 11000 But above, you indicated that 11000 represented baba. I don't understand what mapping or calculation you are trying to describe. > and by comparing signatures we know that we can match baba or abcd but not > aaa and aaccee. > It has problem that if path is long signature would consist from all 1 Are you suggesting some kind of Bloom filter? > and we dont get any information. But we can split path to parts separated by > slashes and do it partwise. If pattern is a?bcd ? can be / so we seek bcd. > We know that /ab/bc/cd cant contain pattern because signatures > 11000,01100,00110 dont contain 01110. As far as I can tell though, we've exchanged one variety of substring matching for another. But perhaps I have misunderstood something about what you are trying to describe. Are you talking about using something like the Rabin-Karp method? > Signature would be 4 bytes long by hash widechar%32. Well max(x%32) == 11111 binary, so I guess I have some idea why all your examples have 5 binary digits. But I still have no idea what you're really trying to describe, I'm afraid. James.
