Re: [Ferret-talk] issues with index for table with over 18 million records

Erik Morton Wed, 08 Aug 2007 13:50:29 -0700

We have a 1 million record index that is about 6GB in size. We build  
it in parallel w/out AAF so it's hard to comment on the speed of your  
index build. However I will say that I did need to manually patch  
Ferret to better handle large indexes.


Here is the diff:

--- /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/ext/index.c
+++ index.c
@@ -1375,7 +1375,7 @@
      lazy_doc = lazy_doc_new(stored_cnt, fdt_in);
      for (i = 0; i < stored_cnt; i++) {
-        int start = 0, end, data_cnt;
+        off_t start = 0, end, data_cnt;
          field_num = is_read_vint(fdt_in);
          fi = fr->fis->fields[field_num];
          data_cnt = is_read_vint(fdt_in);
@@ -1449,7 +1449,7 @@
          if (store_offsets) {
              int num_positions = tv->offset_cnt = is_read_vint(fdt_in);
              Offset *offsets = tv->offsets = ALLOC_N(Offset,  
num_positions);
-            int offset = 0;
+            off_t offset = 0;
              for (i = 0; i < num_positions; i++) {
                  offsets[i].start = offset += is_read_vint(fdt_in);
                  offsets[i].end = offset += is_read_vint(fdt_in);
@@ -1683,8 +1683,8 @@
          int last_end = 0;
          os_write_vint(fdt_out, offset_count);  /* write shared  
prefix length */
          for (i = 0; i < offset_count; i++) {
-            int start = offsets[i].start;
-            int end = offsets[i].end;
+            off_t start = offsets[i].start;
+            off_t end = offsets[i].end;
              os_write_vint(fdt_out, start - last_end);
              os_write_vint(fdt_out, end - start);
              last_end = end;
@@ -4799,7 +4799,7 @@
   *
    
************************************************************************ 
****/
-Offset *offset_new(int start, int end)
+Offset *offset_new(off_t start, off_t end)
{
      Offset *offset = ALLOC(Offset);
      offset->start = start;

On Aug 8, 2007, at 4:16 PM, Craig Jolicoeur wrote:

> I have a MySQL table with over 18 million records in it.  We are
> indexing about 10 fields in this table with ferret.
>
> I am having problems with the initial building of the index.  I  
> created
> a rake task to run the "Model.rebuild_index" command in the  
> background.
> That process ran fine for about 2.5 days before it just suddenly
> stopped.  The log/ferret_index.log file says it got to about 28%  
> before
> ending.  I'm not sure if the process died because of something on my
> server or because of something related to ferret.
>
> It appears that it will take close to 10 days for the full index to be
> build with rebuild_index?  Is this normal for a table of this size?
> Also, is there a way to start where the index ended and update from
> there instead of having to rebuild the entire index from scratch?   
> I got
> about 28% of the way through so would like to not have to waste the  
> 2.5
> days to rebuild that part again trying to get the full index 100%  
> built.
>
> Also, is there a way that I can non-destructive rebuild the index  
> since
> it didnt complete 100%? Meaning, can I rebuild it without overwriting
> what is already there?  That way I can keep what I have to be searched
> while the rebuild takes place and then move that over the old index?
> I'm not running ferret as a Drb server so I dont know if I can.
>
> Also, is there a faster or better way that I can/should be building  
> the
> index?  Will I have an issue with the index file sizes with a DB this
> size?
> -- 
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] issues with index for table with over 18 million records

Reply via email to