Re: Re: Re: MapFile.get() has a bug?

2006-11-28 Thread Albert Chern

Maybe you could wrap the keys in a WritableComparable object that
combines the key with an integer, so you have something like:

( [k1, 0], v1 )
( [k1, 1], v2 )
( [k1, 2], v3 )

Then when you want to read the values for k1, look for [k1, 0], and
keep reading until the key is no longer k1.

On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote:

Thanks, I understood what happened.

but is there any solution to work around it?

because one key has too large number of values, it is impossible to wrap all
the values into one Writable object. so i have to append (k1, v1), (k1,
v2) and so on.

some idea?

Feng

On 11/28/06, Albert Chern [EMAIL PROTECTED] wrote:

 Well, I looked at the source and I can tell you WHY it happens, but
 I'm not sure if the behavior is correct or not.  Basically the MapFile
 keeps an index of where each key is; this index is how the MapFile
 seeks quickly to the correct record.  However, there is a parameter
 called the index interval controlling how many index entries there
 are.  Every time the size of the map file hits a multiple of the index
 interval, an index entry is written.  Therefore, it is possible that
 an index entry is not added for the first occurrence of a key, but one
 of the later ones.  The reader will then seek to one of those instead
 of the first.

 This does seem to be inconsistent with the the fact that you are
 allowed to insert equal key records.  I suspect perhaps the developers
 meant for MapFile records to be uniquely keyed, but in
 MapFile.Writer.checkKey() they used a  where they intended a = or
 something.

 On 11/27/06, Feng Jiang [EMAIL PROTECTED] wrote:
  Sorry, i made a miss spelling:)
 
  I do have such a file. but i am concerning that why the reader is
  NOTpositioned at the first entry of that named key?
 
  On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote:
  
   In the MapFile.Writer.checkKey() method, identical key is ok, unless
 you
   append a new key which is less than the last key.
  
   I did have such a file. but i am concerning that why the reader is
   positioned at the first entry of that named key?
  
   best wishes,
  
   Feng
  
   On 11/28/06, Stefan Groschupf [EMAIL PROTECTED] wrote:
   
Hi,
   
Aren't keys in a map file unique? I'm surprised that you able to
write such a file.
   
Stefan
   
On 27.11.2006, at 22:15, Feng Jiang wrote:
   
 Hi all,

 For example, I have a MapFile, which is like:

 K - V
 1 - 1
 1 - 2
 1 - 3
 2 - 1
 2 - 2
 2 - 3
 3 - 1
 3 - 2
 3 - 3

 when i call mapFile.get (2, value), the value will be filled as 2,
 not 1.

 Is is a bug of MapFile? I think the reader should be positioned at
 the first
 entry of the named key. am I right?

 Thanks and best regards,

 Feng Jiang
   
~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com
   
   
   
   
   
  
 
 





Re: Re: Re: MapFile.get() has a bug?

2006-11-28 Thread Feng Jiang

Great idea!!! Thank you so much!!!

Best wishes,

Feng

On 11/28/06, Albert Chern [EMAIL PROTECTED] wrote:


Maybe you could wrap the keys in a WritableComparable object that
combines the key with an integer, so you have something like:

( [k1, 0], v1 )
( [k1, 1], v2 )
( [k1, 2], v3 )

Then when you want to read the values for k1, look for [k1, 0], and
keep reading until the key is no longer k1.

On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote:
 Thanks, I understood what happened.

 but is there any solution to work around it?

 because one key has too large number of values, it is impossible to wrap
all
 the values into one Writable object. so i have to append (k1, v1), (k1,
 v2) and so on.

 some idea?

 Feng

 On 11/28/06, Albert Chern [EMAIL PROTECTED] wrote:
 
  Well, I looked at the source and I can tell you WHY it happens, but
  I'm not sure if the behavior is correct or not.  Basically the MapFile
  keeps an index of where each key is; this index is how the MapFile
  seeks quickly to the correct record.  However, there is a parameter
  called the index interval controlling how many index entries there
  are.  Every time the size of the map file hits a multiple of the index
  interval, an index entry is written.  Therefore, it is possible that
  an index entry is not added for the first occurrence of a key, but one
  of the later ones.  The reader will then seek to one of those instead
  of the first.
 
  This does seem to be inconsistent with the the fact that you are
  allowed to insert equal key records.  I suspect perhaps the developers
  meant for MapFile records to be uniquely keyed, but in
  MapFile.Writer.checkKey() they used a  where they intended a = or
  something.
 
  On 11/27/06, Feng Jiang [EMAIL PROTECTED] wrote:
   Sorry, i made a miss spelling:)
  
   I do have such a file. but i am concerning that why the reader is
   NOTpositioned at the first entry of that named key?
  
   On 11/28/06, Feng Jiang [EMAIL PROTECTED] wrote:
   
In the MapFile.Writer.checkKey() method, identical key is ok,
unless
  you
append a new key which is less than the last key.
   
I did have such a file. but i am concerning that why the reader is
positioned at the first entry of that named key?
   
best wishes,
   
Feng
   
On 11/28/06, Stefan Groschupf [EMAIL PROTECTED] wrote:

 Hi,

 Aren't keys in a map file unique? I'm surprised that you able to
 write such a file.

 Stefan

 On 27.11.2006, at 22:15, Feng Jiang wrote:

  Hi all,
 
  For example, I have a MapFile, which is like:
 
  K - V
  1 - 1
  1 - 2
  1 - 3
  2 - 1
  2 - 2
  2 - 3
  3 - 1
  3 - 2
  3 - 3
 
  when i call mapFile.get (2, value), the value will be filled
as 2,
  not 1.
 
  Is is a bug of MapFile? I think the reader should be
positioned at
  the first
  entry of the named key. am I right?
 
  Thanks and best regards,
 
  Feng Jiang

 ~~~
 101tec Inc.
 search tech for web 2.1
 Menlo Park, California
 http://www.101tec.com





   
  
  
 





Re: MapFile.get() has a bug?

2006-11-28 Thread Doug Cutting

Albert Chern wrote:

Every time the size of the map file hits a multiple of the index
interval, an index entry is written.  Therefore, it is possible that
an index entry is not added for the first occurrence of a key, but one
of the later ones.  The reader will then seek to one of those instead
of the first.

This does seem to be inconsistent with the the fact that you are
allowed to insert equal key records.


Yes, I agree that this is confusing and arguably a bug.


I suspect perhaps the developers
meant for MapFile records to be uniquely keyed, but in
MapFile.Writer.checkKey() they used a  where they intended a = or
something.


I think what actually happened was that I originally coded it to 
prohibit equal keys, then, at some point found an application (somewhere 
in Nutch) where equal keys were useful, and changed MapFile to support 
them, not realizing the consequences.  Sigh.  I don't know whether Nutch 
still relies on this or not.


MapFile could probably be fixed by changing the way the index is 
created, to write the location of the first instance of any run of equal 
keys.  We could also avoid recording two instances of equal keys in the 
index: for a long run of equal keys, we could wait until the key changes 
before emitting a new index entry.


Doug