Re: Re: Re: MapFile.get() has a bug?
Maybe you could wrap the keys in a WritableComparable object that combines the key with an integer, so you have something like: ( [k1, 0], v1 ) ( [k1, 1], v2 ) ( [k1, 2], v3 ) Then when you want to read the values for k1, look for [k1, 0], and keep reading until the key is no longer k1. On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote: Thanks, I understood what happened. but is there any solution to work around it? because one key has too large number of values, it is impossible to wrap all the values into one Writable object. so i have to append (k1, v1), (k1, v2) and so on. some idea? Feng On 11/28/06, Albert Chern [EMAIL PROTECTED] wrote: Well, I looked at the source and I can tell you WHY it happens, but I'm not sure if the behavior is correct or not. Basically the MapFile keeps an index of where each key is; this index is how the MapFile seeks quickly to the correct record. However, there is a parameter called the index interval controlling how many index entries there are. Every time the size of the map file hits a multiple of the index interval, an index entry is written. Therefore, it is possible that an index entry is not added for the first occurrence of a key, but one of the later ones. The reader will then seek to one of those instead of the first. This does seem to be inconsistent with the the fact that you are allowed to insert equal key records. I suspect perhaps the developers meant for MapFile records to be uniquely keyed, but in MapFile.Writer.checkKey() they used a where they intended a = or something. On 11/27/06, Feng Jiang [EMAIL PROTECTED] wrote: Sorry, i made a miss spelling:) I do have such a file. but i am concerning that why the reader is NOTpositioned at the first entry of that named key? On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote: In the MapFile.Writer.checkKey() method, identical key is ok, unless you append a new key which is less than the last key. I did have such a file. but i am concerning that why the reader is positioned at the first entry of that named key? best wishes, Feng On 11/28/06, Stefan Groschupf [EMAIL PROTECTED] wrote: Hi, Aren't keys in a map file unique? I'm surprised that you able to write such a file. Stefan On 27.11.2006, at 22:15, Feng Jiang wrote: Hi all, For example, I have a MapFile, which is like: K - V 1 - 1 1 - 2 1 - 3 2 - 1 2 - 2 2 - 3 3 - 1 3 - 2 3 - 3 when i call mapFile.get (2, value), the value will be filled as 2, not 1. Is is a bug of MapFile? I think the reader should be positioned at the first entry of the named key. am I right? Thanks and best regards, Feng Jiang ~~~ 101tec Inc. search tech for web 2.1 Menlo Park, California http://www.101tec.com
Re: Re: Re: MapFile.get() has a bug?
Great idea!!! Thank you so much!!! Best wishes, Feng On 11/28/06, Albert Chern [EMAIL PROTECTED] wrote: Maybe you could wrap the keys in a WritableComparable object that combines the key with an integer, so you have something like: ( [k1, 0], v1 ) ( [k1, 1], v2 ) ( [k1, 2], v3 ) Then when you want to read the values for k1, look for [k1, 0], and keep reading until the key is no longer k1. On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote: Thanks, I understood what happened. but is there any solution to work around it? because one key has too large number of values, it is impossible to wrap all the values into one Writable object. so i have to append (k1, v1), (k1, v2) and so on. some idea? Feng On 11/28/06, Albert Chern [EMAIL PROTECTED] wrote: Well, I looked at the source and I can tell you WHY it happens, but I'm not sure if the behavior is correct or not. Basically the MapFile keeps an index of where each key is; this index is how the MapFile seeks quickly to the correct record. However, there is a parameter called the index interval controlling how many index entries there are. Every time the size of the map file hits a multiple of the index interval, an index entry is written. Therefore, it is possible that an index entry is not added for the first occurrence of a key, but one of the later ones. The reader will then seek to one of those instead of the first. This does seem to be inconsistent with the the fact that you are allowed to insert equal key records. I suspect perhaps the developers meant for MapFile records to be uniquely keyed, but in MapFile.Writer.checkKey() they used a where they intended a = or something. On 11/27/06, Feng Jiang [EMAIL PROTECTED] wrote: Sorry, i made a miss spelling:) I do have such a file. but i am concerning that why the reader is NOTpositioned at the first entry of that named key? On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote: In the MapFile.Writer.checkKey() method, identical key is ok, unless you append a new key which is less than the last key. I did have such a file. but i am concerning that why the reader is positioned at the first entry of that named key? best wishes, Feng On 11/28/06, Stefan Groschupf [EMAIL PROTECTED] wrote: Hi, Aren't keys in a map file unique? I'm surprised that you able to write such a file. Stefan On 27.11.2006, at 22:15, Feng Jiang wrote: Hi all, For example, I have a MapFile, which is like: K - V 1 - 1 1 - 2 1 - 3 2 - 1 2 - 2 2 - 3 3 - 1 3 - 2 3 - 3 when i call mapFile.get (2, value), the value will be filled as 2, not 1. Is is a bug of MapFile? I think the reader should be positioned at the first entry of the named key. am I right? Thanks and best regards, Feng Jiang ~~~ 101tec Inc. search tech for web 2.1 Menlo Park, California http://www.101tec.com
Re: MapFile.get() has a bug?
Albert Chern wrote: Every time the size of the map file hits a multiple of the index interval, an index entry is written. Therefore, it is possible that an index entry is not added for the first occurrence of a key, but one of the later ones. The reader will then seek to one of those instead of the first. This does seem to be inconsistent with the the fact that you are allowed to insert equal key records. Yes, I agree that this is confusing and arguably a bug. I suspect perhaps the developers meant for MapFile records to be uniquely keyed, but in MapFile.Writer.checkKey() they used a where they intended a = or something. I think what actually happened was that I originally coded it to prohibit equal keys, then, at some point found an application (somewhere in Nutch) where equal keys were useful, and changed MapFile to support them, not realizing the consequences. Sigh. I don't know whether Nutch still relies on this or not. MapFile could probably be fixed by changing the way the index is created, to write the location of the first instance of any run of equal keys. We could also avoid recording two instances of equal keys in the index: for a long run of equal keys, we could wait until the key changes before emitting a new index entry. Doug