Sorry for taking so long to respond. On Fri, Dec 13, 2013 at 7:41 AM, Naresh Yadav <[email protected]> wrote:
> Hi aaron, > > I am little confused on problem of immediate visibility of data. My case i > need guaranteed immediate visibility of index. > This is a normal behavior of Lucene based technologies (for the most part). There is a certain amount of time after the data is posted to an index writer before the data can be searchable. We are going to be trying to improve this behavior in 0.3, but more than likely there will always be some sort of delay. > I tried with flag on the RowMutation object called waitForVisiblity and set > it true then my same program for inserting > 17000 rows started taking more than 5 minutes and even not completing > fully, which was before taking 1minute. It starts throwing > exception of All connections bad after 5-6 minutes..........If i run with > waitForVisiblity=false it works fine in a minute. > With only 17,000 rows I would possibly try using the batch update version of the mutate. Depending on the size of your rows potentially using batch sizes of a 1,000. As far as the exception goes, if you could send the stack trace back to the list when can try to fix/debug what's going on. It could have already been fixed in the unreleased 0.2.2. I think that something like transactions would likely help in this situation. Meaning: Load all your data. Commit (or Rollback) After commit everything is visible. I have been thinking about adding something like this to Blur for awhile, but with trying to get 0.2.2 production ready I haven't had time to work on new features. > > Second question is regarding backups..i tried create snapshot and it was > success.. I was eager to know if this i can see in windows > filesystem and copy it to move to another machine and import(no command > found for this) there in hdfs. > Snapshots merely freeze the index to a particular point in time and prevent those files from being deleted. In a future release there will be a way to perform MapReduce over these snapshots, also you will be able to control the index data through snapshots, and perform backups. As for now, unless you write some code to use them they aren't useful. > > Thanks > Naresh > > > > On Fri, Dec 6, 2013 at 6:33 PM, Aaron McCurry <[email protected]> wrote: > > > On Fri, Dec 6, 2013 at 7:54 AM, Naresh Yadav <[email protected]> > wrote: > > > > > Hi, > > > > > > I have few doubts related to blur please help me on this : > > > > > > 1. Is there a way i can see all rows of data in a blur table ??? did > not > > > find any blur shell command.. > > > > > > > This will give you all the rows. > > > > query <tablename> * > > > > > > > > > > 2. Is delete of data possible with where clause as query (similar to > > query > > > command)?? I want to delete all data by matching two columns values > > through > > > blur shell.. > > > > > > > Not yet. https://issues.apache.org/jira/browse/BLUR-130 > > > > This shouldn't difficult to add. > > > > > > > > > > 3.After storing 17000 rows then i run queries to get each one then that > > > returned only 16900 rows...After 5 mins i again run queries to get each > > one > > > then returned all 17000 rows.........Is there solution for this ?? In > my > > > cased just after inserting data, i need to immediately run query over > it. > > > > > > > There is a delay on visibility of data within Blur. I believe the > default > > for a given table is 3 seconds, this can be configured by changing this > > setting: > > > > blur.shard.time.between.refreshs=3000 > > > > In the table properties, or in the blur-site.properties file. > > > > Be aware that decreasing this time will also decrease the speed in which > > mutates can occur. Also there is a flag on the RowMutation object called > > waitForVisiblity if this is set to true the mutate command will not > return > > until the data is searchable. NOTE: This will slow things down! So only > > do this if you have to wait. > > > > Hope this helps. > > > > Aaron > > > > > > > > > > Thanks, > > > Naresh > > > > > >
