Hi Tim,

Sure, I can try building the trunk version and see if that fixes it.

However, we are using 0.2.3 at work.  Do you recall if there was a
workaround for the issue?  It's not a big issue, but if there is a
workaround, I'll use it.

Thanks,
-- Tom


On Sun, Oct 5, 2014 at 5:03 PM, Tim Williams <[email protected]> wrote:

> Hi Tom,
> Are you comfortable trying out a trunk version?  If so, I'm wondering
> if you can reproduce this on trunk - as this seems similar to an issue
> recently resolved.
>
> --tim
>
>
> On Sun, Oct 5, 2014 at 4:47 PM, Tom Hood <[email protected]> wrote:
> > Hi,
> >
> > I'm new to blur and have been spending a little time today learning the
> > 0.2.3 API.  I'm having trouble dumping the terms of a blur index.
> >
> > Here's some code that uses Iface.terms that sort of works (see below),
> but
> > has an issue depending on the size parameter passed to Iface.terms
> >
> > It wasn't obvious to me how to detect the end-of-terms condition, so if
> > there's a cleaner way, please let me know.
> >
> >     public static void DumpTerms(Iface blurClient, String tableName)
> >         throws BlurException, TException
> >     {
> >         Schema schema = blurClient.schema(tableName);
> >         for (Map<String,ColumnDefinition> familyDef :
> > schema.getFamilies().values()) {
> >             for (ColumnDefinition columnDef : familyDef.values()) {
> >                 DumpTermsForColumn(blurClient, tableName, columnDef);
> >             }
> >         }
> >     }
> >
> >     public static void DumpTermsForColumn(Iface            blurClient,
> >                                           String           tableName,
> >                                           ColumnDefinition columnDef)
> >         throws BlurException, TException
> >     {
> >         String family = columnDef.getFamily();
> >         String column = columnDef.getColumnName();
> >         String type = columnDef.getFieldType();
> >
> >         System.out.println(columnDef);
> >         if (   !type.equals(TextFieldTypeDefinition.NAME)
> >             && !type.equals(StringFieldTypeDefinition.NAME)) {
> >             System.out.println(" WARNING: terms unavailable for type " +
> > type);
> >             return;
> >         }
> >
> >         String startTerm = "";
> >         int termCount = 0;
> >         final short termFetchSize = 20;// loop logic assumes this is at
> > least 2
> >         while (true) {
> >             List<String> terms = blurClient.terms(tableName,
> >                                                   family,
> >                                                   column,
> >                                                   startTerm,
> >                                                   termFetchSize);
> >             if (   terms.isEmpty()
> >                 || (terms.size() == 1 &&
> terms.get(0).equals(startTerm))) {
> >                 return;
> >             }
> >             for (String term : terms) {
> >                 if (term.equals(startTerm)) {
> >                     // 1st term is startTerm on calls 2-N of
> > blurClient.terms
> >                     continue;
> >                 }
> >                 if (term.isEmpty()) {
> >                     // empty string returned when termFetchSize > terms
> left
> >                     return;
> >                 }
> >                 startTerm = term;
> >                 long termFreq = blurClient.recordFrequency(tableName,
> >                                                            family,
> >                                                            column,
> >                                                            term);
> >                 System.out.println("    term " + ++termCount
> >                                    + ": [" + term + "] freq=" +
> termFreq);
> >             }
> >         }
> >     }
> >
> > ColumnDefinition(family:technology, columnName:author,
> subColumnName:null,
> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
> >     term 1: [andy] freq=1
> >     term 2: [beck] freq=1
> >     term 3: [dave] freq=1
> >     term 4: [douglas] freq=1
> >     term 5: [erik] freq=2
> >     term 6: [gospodnetic] freq=1
> >     term 7: [hatcher] freq=2
> >     term 8: [hofstadter] freq=1
> >     term 9: [howard] freq=1
> >     term 10: [hunt] freq=1
> >     term 11: [husted] freq=1
> >     term 12: [kent] freq=1
> >     term 13: [lewis] freq=1
> >     term 14: [loughran] freq=1
> >     term 15: [massol] freq=1
> >     term 16: [otis] freq=1
> >     term 17: [papert] freq=1
> >     term 18: [seymour] freq=1
> >     term 19: [ship] freq=1
> >     term 20: [steve] freq=1
> >     term 21: [ted] freq=1
> >     term 22: [thomas] freq=1
> >     term 23: [vincent] freq=1
> > ColumnDefinition(family:technology, columnName:title, subColumnName:null,
> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
> >     term 1: [action] freq=3
> >     term 2: [an] freq=1
> >     term 3: [ant] freq=1
> >     term 4: [bach] freq=1
> >     term 5: [braid] freq=1
> >     term 6: [development] freq=1
> >     term 7: [escher] freq=1
> >     term 8: [eternal] freq=1
> >     term 9: [explained] freq=1
> >     term 10: [extreme] freq=1
> >     term 11: [g] freq=1
> >     term 12: [golden] freq=1
> >     term 13: [in] freq=3
> >     term 14: [java] freq=1
> >     term 15: [junit] freq=1
> >     term 16: [lucene] freq=1
> >     term 17: [mindstorms] freq=1
> >     term 18: [pragmatic] freq=1
> >     term 19: [programmer] freq=1
> >     term 20: [programming] freq=1
> >     term 21: [tapestry] freq=1
> >     term 22: [the] freq=1
> >     term 23: [u00f6del] freq=1
> >     term 24: [with] freq=1
> > ColumnDefinition(family:technology, columnName:pubmonth,
> subColumnName:null,
> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
> >     term 1: [197903] freq=1
> >     term 2: [198001] freq=1
> >     term 3: [199910] freq=2
> >     term 4: [200208] freq=1
> >     term 5: [200310] freq=1
> >     term 6: [200403] freq=1
> >     term 7: [200406] freq=1
> > ColumnDefinition(family:technology, columnName:subject,
> subColumnName:null,
> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
> >     term 1: [agile] freq=2
> >     term 2: [ant] freq=1
> >     term 3: [apache] freq=1
> >     term 4: [artificial] freq=1
> >     term 5: [build] freq=1
> >     term 6: [children] freq=1
> >     term 7: [components] freq=1
> >     term 8: [computers] freq=1
> >     term 9: [developer] freq=1
> >     term 10: [development] freq=2
> >     term 11: [driven] freq=1
> >     term 12: [education] freq=1
> >     term 13: [extreme] freq=1
> >     term 14: [ideas] freq=1
> >     term 15: [intelligence] freq=1
> >     term 16: [interface] freq=1
> >     term 17: [jakarta] freq=1
> >     term 18: [java] freq=1
> >     term 19: [junit] freq=2
> >     term 20: [logo] freq=1
> >     term 21: [lucene] freq=1
> >     term 22: [mathematics] freq=1
> >     term 23: [methodology] freq=2
> >     term 24: [mock] freq=1
> >     term 25: [music] freq=1
> >     term 26: [number] freq=1
> >     term 27: [objects] freq=1
> >     term 28: [powerful] freq=1
> >     term 29: [pragmatic] freq=1
> >     term 30: [programming] freq=1
> >     term 31: [search] freq=1
> >     term 32: [tapestry] freq=1
> >     term 33: [test] freq=1
> >     term 34: [testing] freq=1
> >     term 35: [theory] freq=1
> >     term 36: [tool] freq=1
> >     term 37: [tools] freq=1
> >     term 38: [unit] freq=1
> >     term 39: [user] freq=1
> > ColumnDefinition(family:technology, columnName:isbn, subColumnName:null,
> > fieldLessIndexed:false, fieldType:string, properties:null,
> sortable:false)
> >     term 1: [020161622X] freq=1
> >     term 2: [0201616416] freq=1
> >     term 3: [0465026567] freq=1
> >     term 4: [0465046290] freq=1
> >     term 5: [1930110588] freq=1
> >     term 6: [1930110995] freq=1
> >     term 7: [1932394117] freq=1
> >     term 8: [tbd] freq=1
> > ColumnDefinition(family:technology, columnName:url, subColumnName:null,
> > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false)
> >     term 1: [0201616416] freq=1
> >     term 2: [0465026567] freq=1
> >     term 3: [antbook] freq=1
> >     term 4: [detail] freq=2
> >     term 5: [exec] freq=2
> >     term 6: [http] freq=8
> >     term 7: [index.shtml] freq=1
> >     term 8: [lewisship] freq=1
> >     term 9: [lucene] freq=1
> >     term 10: [massol] freq=1
> >     term 11: [obidos] freq=2
> >     term 12: [ppbook] freq=1
> >     term 13: [tg] freq=2
> >     term 14: [www.amazon.com] freq=2
> >     term 15: [www.manning.com] freq=4
> >     term 16: [www.papert.org] freq=1
> >     term 17: [www.pragmaticprogrammer.com] freq=1
> > Exception in thread "main" BlurException(message:Call execution exception
> > [[lia, technology, url, www.pragmaticprogrammer.com, 20]],
> > stackTraceStr:java.lang.ArrayIndexOutOfBoundsException: 128
> > at
> >
> org.apache.lucene.store.ByteArrayDataInput.readVInt(ByteArrayDataInput.java:104)
> > at
> >
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2467)
> > at
> >
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2459)
> > at
> >
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2139)
> > at
> >
> org.apache.blur.index.ExitableReader$ExitableTermsEnum.next(ExitableReader.java:233)
> > at org.apache.blur.manager.IndexManager.terms(IndexManager.java:1031)
> > at org.apache.blur.manager.IndexManager$9.call(IndexManager.java:982)
> > at org.apache.blur.manager.IndexManager$9.call(IndexManager.java:976)
> > at org.apache.blur.utils.ForkJoin$2.call(ForkJoin.java:63)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> org.apache.blur.concurrent.ThreadWatcher$ThreadWatcherExecutorService$1.run(ThreadWatcher.java:127)
> > at
> >
> org.apache.blur.concurrent.BlurThreadPoolExecutor$1.run(BlurThreadPoolExecutor.java:83)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > at java.lang.Thread.run(Thread.java:662)
> > , errorType:UNKNOWN)
> > at
> >
> org.apache.blur.thrift.generated.Blur$terms_result$terms_resultStandardScheme.read(Blur.java:26728)
> > at
> >
> org.apache.blur.thrift.generated.Blur$terms_result$terms_resultStandardScheme.read(Blur.java:26696)
> > at
> org.apache.blur.thrift.generated.Blur$terms_result.read(Blur.java:26638)
> > at
> >
> org.apache.blur.thirdparty.thrift_0_9_0.TServiceClient.receiveBase(TServiceClient.java:78)
> > at
> org.apache.blur.thrift.generated.Blur$Client.recv_terms(Blur.java:1212)
> > at
> >
> org.apache.blur.thrift.generated.SafeClientGen.recv_terms(SafeClientGen.java:508)
> > at org.apache.blur.thrift.generated.Blur$Client.terms(Blur.java:1195)
> > at
> >
> org.apache.blur.thrift.generated.SafeClientGen.terms(SafeClientGen.java:942)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> >
> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler$1.call(BlurClient.java:60)
> > at
> >
> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler$1.call(BlurClient.java:56)
> > at org.apache.blur.thrift.AbstractCommand.call(AbstractCommand.java:62)
> > at
> >
> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:197)
> > at
> >
> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:56)
> > at com.sun.proxy.$Proxy0.terms(Unknown Source)
> > at
> >
> hoodware.sandbox.blur.BlurIndexMain.DumpTermsForColumn(BlurIndexMain.java:88)
> > at hoodware.sandbox.blur.BlurIndexMain.DumpTerms(BlurIndexMain.java:64)
> > at hoodware.sandbox.blur.BlurIndexMain.main(BlurIndexMain.java:38)
> >
> > The code works if I change termFetchSize to 2 instead of 20.
> >
> > The command "blur terms lia technology.url" will get the same exception.
> >
> > The command "blur terms lia technology.url -s2" will not get the
> exception,
> > but goes into an infinite loop after it outputs: "-
> > |www.pragmaticprogrammer.com "
> >
> > Attached is the csv file that I loaded into an empty table.  It's a
> > reformatted version of the Lucene In Action book's sample data (taken
> from
> > data directory in
> > http://www.manning-source.com/books/hatcher2/LuceneInAction.zip)
> >
> > I created the table with the commands:
> >
> > hadoop fs -mkdir lia_input
> > hadoop fs -copyFromLocal ~/projects/lucene/LuceneInAction.csv lia_input
> > hadoop fs -mkdir tables
> > blur create -t lia -c 2 -l tables/lia
> >
> > foreach family (health technology philosophy education)
> >     blur definecolumn lia $family title text
> >     blur definecolumn lia $family isbn string
> >     blur definecolumn lia $family author text
> > #    blur definecolumn lia $family pubmonth date -p dateFormat yyyyMM
> >     blur definecolumn lia $family pubmonth text # must be text for
> > Blur.Iface.terms
> >     blur definecolumn lia $family subject text
> >     blur definecolumn lia $family url text
> > end
> >
> > blur csvloader -c localhost:40010 -A -a -t lia -i lia_input -s';' \
> >     -d 'health title isbn author pubmonth subject url' \
> >     -d 'technology title isbn author pubmonth subject url' \
> >     -d 'philosophy title isbn author pubmonth subject url' \
> >     -d 'education title isbn author pubmonth subject url'
> >
> > Please let me know if you have any ideas on what I'm doing wrong.
> >
> > Thanks,
> > -- Tom
> >
> >
>

Reply via email to