Hi Tim, Sure, I can try building the trunk version and see if that fixes it.
However, we are using 0.2.3 at work. Do you recall if there was a workaround for the issue? It's not a big issue, but if there is a workaround, I'll use it. Thanks, -- Tom On Sun, Oct 5, 2014 at 5:03 PM, Tim Williams <[email protected]> wrote: > Hi Tom, > Are you comfortable trying out a trunk version? If so, I'm wondering > if you can reproduce this on trunk - as this seems similar to an issue > recently resolved. > > --tim > > > On Sun, Oct 5, 2014 at 4:47 PM, Tom Hood <[email protected]> wrote: > > Hi, > > > > I'm new to blur and have been spending a little time today learning the > > 0.2.3 API. I'm having trouble dumping the terms of a blur index. > > > > Here's some code that uses Iface.terms that sort of works (see below), > but > > has an issue depending on the size parameter passed to Iface.terms > > > > It wasn't obvious to me how to detect the end-of-terms condition, so if > > there's a cleaner way, please let me know. > > > > public static void DumpTerms(Iface blurClient, String tableName) > > throws BlurException, TException > > { > > Schema schema = blurClient.schema(tableName); > > for (Map<String,ColumnDefinition> familyDef : > > schema.getFamilies().values()) { > > for (ColumnDefinition columnDef : familyDef.values()) { > > DumpTermsForColumn(blurClient, tableName, columnDef); > > } > > } > > } > > > > public static void DumpTermsForColumn(Iface blurClient, > > String tableName, > > ColumnDefinition columnDef) > > throws BlurException, TException > > { > > String family = columnDef.getFamily(); > > String column = columnDef.getColumnName(); > > String type = columnDef.getFieldType(); > > > > System.out.println(columnDef); > > if ( !type.equals(TextFieldTypeDefinition.NAME) > > && !type.equals(StringFieldTypeDefinition.NAME)) { > > System.out.println(" WARNING: terms unavailable for type " + > > type); > > return; > > } > > > > String startTerm = ""; > > int termCount = 0; > > final short termFetchSize = 20;// loop logic assumes this is at > > least 2 > > while (true) { > > List<String> terms = blurClient.terms(tableName, > > family, > > column, > > startTerm, > > termFetchSize); > > if ( terms.isEmpty() > > || (terms.size() == 1 && > terms.get(0).equals(startTerm))) { > > return; > > } > > for (String term : terms) { > > if (term.equals(startTerm)) { > > // 1st term is startTerm on calls 2-N of > > blurClient.terms > > continue; > > } > > if (term.isEmpty()) { > > // empty string returned when termFetchSize > terms > left > > return; > > } > > startTerm = term; > > long termFreq = blurClient.recordFrequency(tableName, > > family, > > column, > > term); > > System.out.println(" term " + ++termCount > > + ": [" + term + "] freq=" + > termFreq); > > } > > } > > } > > > > ColumnDefinition(family:technology, columnName:author, > subColumnName:null, > > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false) > > term 1: [andy] freq=1 > > term 2: [beck] freq=1 > > term 3: [dave] freq=1 > > term 4: [douglas] freq=1 > > term 5: [erik] freq=2 > > term 6: [gospodnetic] freq=1 > > term 7: [hatcher] freq=2 > > term 8: [hofstadter] freq=1 > > term 9: [howard] freq=1 > > term 10: [hunt] freq=1 > > term 11: [husted] freq=1 > > term 12: [kent] freq=1 > > term 13: [lewis] freq=1 > > term 14: [loughran] freq=1 > > term 15: [massol] freq=1 > > term 16: [otis] freq=1 > > term 17: [papert] freq=1 > > term 18: [seymour] freq=1 > > term 19: [ship] freq=1 > > term 20: [steve] freq=1 > > term 21: [ted] freq=1 > > term 22: [thomas] freq=1 > > term 23: [vincent] freq=1 > > ColumnDefinition(family:technology, columnName:title, subColumnName:null, > > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false) > > term 1: [action] freq=3 > > term 2: [an] freq=1 > > term 3: [ant] freq=1 > > term 4: [bach] freq=1 > > term 5: [braid] freq=1 > > term 6: [development] freq=1 > > term 7: [escher] freq=1 > > term 8: [eternal] freq=1 > > term 9: [explained] freq=1 > > term 10: [extreme] freq=1 > > term 11: [g] freq=1 > > term 12: [golden] freq=1 > > term 13: [in] freq=3 > > term 14: [java] freq=1 > > term 15: [junit] freq=1 > > term 16: [lucene] freq=1 > > term 17: [mindstorms] freq=1 > > term 18: [pragmatic] freq=1 > > term 19: [programmer] freq=1 > > term 20: [programming] freq=1 > > term 21: [tapestry] freq=1 > > term 22: [the] freq=1 > > term 23: [u00f6del] freq=1 > > term 24: [with] freq=1 > > ColumnDefinition(family:technology, columnName:pubmonth, > subColumnName:null, > > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false) > > term 1: [197903] freq=1 > > term 2: [198001] freq=1 > > term 3: [199910] freq=2 > > term 4: [200208] freq=1 > > term 5: [200310] freq=1 > > term 6: [200403] freq=1 > > term 7: [200406] freq=1 > > ColumnDefinition(family:technology, columnName:subject, > subColumnName:null, > > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false) > > term 1: [agile] freq=2 > > term 2: [ant] freq=1 > > term 3: [apache] freq=1 > > term 4: [artificial] freq=1 > > term 5: [build] freq=1 > > term 6: [children] freq=1 > > term 7: [components] freq=1 > > term 8: [computers] freq=1 > > term 9: [developer] freq=1 > > term 10: [development] freq=2 > > term 11: [driven] freq=1 > > term 12: [education] freq=1 > > term 13: [extreme] freq=1 > > term 14: [ideas] freq=1 > > term 15: [intelligence] freq=1 > > term 16: [interface] freq=1 > > term 17: [jakarta] freq=1 > > term 18: [java] freq=1 > > term 19: [junit] freq=2 > > term 20: [logo] freq=1 > > term 21: [lucene] freq=1 > > term 22: [mathematics] freq=1 > > term 23: [methodology] freq=2 > > term 24: [mock] freq=1 > > term 25: [music] freq=1 > > term 26: [number] freq=1 > > term 27: [objects] freq=1 > > term 28: [powerful] freq=1 > > term 29: [pragmatic] freq=1 > > term 30: [programming] freq=1 > > term 31: [search] freq=1 > > term 32: [tapestry] freq=1 > > term 33: [test] freq=1 > > term 34: [testing] freq=1 > > term 35: [theory] freq=1 > > term 36: [tool] freq=1 > > term 37: [tools] freq=1 > > term 38: [unit] freq=1 > > term 39: [user] freq=1 > > ColumnDefinition(family:technology, columnName:isbn, subColumnName:null, > > fieldLessIndexed:false, fieldType:string, properties:null, > sortable:false) > > term 1: [020161622X] freq=1 > > term 2: [0201616416] freq=1 > > term 3: [0465026567] freq=1 > > term 4: [0465046290] freq=1 > > term 5: [1930110588] freq=1 > > term 6: [1930110995] freq=1 > > term 7: [1932394117] freq=1 > > term 8: [tbd] freq=1 > > ColumnDefinition(family:technology, columnName:url, subColumnName:null, > > fieldLessIndexed:false, fieldType:text, properties:null, sortable:false) > > term 1: [0201616416] freq=1 > > term 2: [0465026567] freq=1 > > term 3: [antbook] freq=1 > > term 4: [detail] freq=2 > > term 5: [exec] freq=2 > > term 6: [http] freq=8 > > term 7: [index.shtml] freq=1 > > term 8: [lewisship] freq=1 > > term 9: [lucene] freq=1 > > term 10: [massol] freq=1 > > term 11: [obidos] freq=2 > > term 12: [ppbook] freq=1 > > term 13: [tg] freq=2 > > term 14: [www.amazon.com] freq=2 > > term 15: [www.manning.com] freq=4 > > term 16: [www.papert.org] freq=1 > > term 17: [www.pragmaticprogrammer.com] freq=1 > > Exception in thread "main" BlurException(message:Call execution exception > > [[lia, technology, url, www.pragmaticprogrammer.com, 20]], > > stackTraceStr:java.lang.ArrayIndexOutOfBoundsException: 128 > > at > > > org.apache.lucene.store.ByteArrayDataInput.readVInt(ByteArrayDataInput.java:104) > > at > > > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2467) > > at > > > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2459) > > at > > > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2139) > > at > > > org.apache.blur.index.ExitableReader$ExitableTermsEnum.next(ExitableReader.java:233) > > at org.apache.blur.manager.IndexManager.terms(IndexManager.java:1031) > > at org.apache.blur.manager.IndexManager$9.call(IndexManager.java:982) > > at org.apache.blur.manager.IndexManager$9.call(IndexManager.java:976) > > at org.apache.blur.utils.ForkJoin$2.call(ForkJoin.java:63) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > org.apache.blur.concurrent.ThreadWatcher$ThreadWatcherExecutorService$1.run(ThreadWatcher.java:127) > > at > > > org.apache.blur.concurrent.BlurThreadPoolExecutor$1.run(BlurThreadPoolExecutor.java:83) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > > at java.lang.Thread.run(Thread.java:662) > > , errorType:UNKNOWN) > > at > > > org.apache.blur.thrift.generated.Blur$terms_result$terms_resultStandardScheme.read(Blur.java:26728) > > at > > > org.apache.blur.thrift.generated.Blur$terms_result$terms_resultStandardScheme.read(Blur.java:26696) > > at > org.apache.blur.thrift.generated.Blur$terms_result.read(Blur.java:26638) > > at > > > org.apache.blur.thirdparty.thrift_0_9_0.TServiceClient.receiveBase(TServiceClient.java:78) > > at > org.apache.blur.thrift.generated.Blur$Client.recv_terms(Blur.java:1212) > > at > > > org.apache.blur.thrift.generated.SafeClientGen.recv_terms(SafeClientGen.java:508) > > at org.apache.blur.thrift.generated.Blur$Client.terms(Blur.java:1195) > > at > > > org.apache.blur.thrift.generated.SafeClientGen.terms(SafeClientGen.java:942) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler$1.call(BlurClient.java:60) > > at > > > org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler$1.call(BlurClient.java:56) > > at org.apache.blur.thrift.AbstractCommand.call(AbstractCommand.java:62) > > at > > > org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:197) > > at > > > org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:56) > > at com.sun.proxy.$Proxy0.terms(Unknown Source) > > at > > > hoodware.sandbox.blur.BlurIndexMain.DumpTermsForColumn(BlurIndexMain.java:88) > > at hoodware.sandbox.blur.BlurIndexMain.DumpTerms(BlurIndexMain.java:64) > > at hoodware.sandbox.blur.BlurIndexMain.main(BlurIndexMain.java:38) > > > > The code works if I change termFetchSize to 2 instead of 20. > > > > The command "blur terms lia technology.url" will get the same exception. > > > > The command "blur terms lia technology.url -s2" will not get the > exception, > > but goes into an infinite loop after it outputs: "- > > |www.pragmaticprogrammer.com " > > > > Attached is the csv file that I loaded into an empty table. It's a > > reformatted version of the Lucene In Action book's sample data (taken > from > > data directory in > > http://www.manning-source.com/books/hatcher2/LuceneInAction.zip) > > > > I created the table with the commands: > > > > hadoop fs -mkdir lia_input > > hadoop fs -copyFromLocal ~/projects/lucene/LuceneInAction.csv lia_input > > hadoop fs -mkdir tables > > blur create -t lia -c 2 -l tables/lia > > > > foreach family (health technology philosophy education) > > blur definecolumn lia $family title text > > blur definecolumn lia $family isbn string > > blur definecolumn lia $family author text > > # blur definecolumn lia $family pubmonth date -p dateFormat yyyyMM > > blur definecolumn lia $family pubmonth text # must be text for > > Blur.Iface.terms > > blur definecolumn lia $family subject text > > blur definecolumn lia $family url text > > end > > > > blur csvloader -c localhost:40010 -A -a -t lia -i lia_input -s';' \ > > -d 'health title isbn author pubmonth subject url' \ > > -d 'technology title isbn author pubmonth subject url' \ > > -d 'philosophy title isbn author pubmonth subject url' \ > > -d 'education title isbn author pubmonth subject url' > > > > Please let me know if you have any ideas on what I'm doing wrong. > > > > Thanks, > > -- Tom > > > > >
