Re: Accumulo on S3

2020-04-03 Thread Josh Elser
It sounds like you're running into the known S3 consistency issues. However, I don't know what exactly EMRFS is supposed to support all of the things that Accumulo requires. I would assume that EMRFS should be bridging the gap from S3 (a blobstore) to a consistent, distributed FileSystem that

Re: Accumulo 1.7 tserver TTransportException errors every minute

2019-12-09 Thread Josh Elser
Yeah, this is an annoying combination of the ambari-agent and Thrift. Good on you to get to the bottom of it, and thanks for commenting back to the list! The agent opens a socket to the Thrift service to do the Ambari "port check", but sends no data (as it's just checking network

Re: Thrift TTransportException.END_OF_FILE

2019-07-15 Thread Josh Elser
it out. There weren't any obviously errors around hardware issues. Is it likely that the TTransportException and commits held are related? On Fri, 12 Jul 2019, 18:56 Josh Elser, <mailto:els...@apache.org>> wrote: "Commits are held" can be for a couple of different

Re: Thrift TTransportException.END_OF_FILE

2019-07-12 Thread Josh Elser
"Commits are held" can be for a couple of different reasons, some from within Accumulo and some from outside. In general, there is an expected ordering of mutations that a TabletServer has to apply. A "commit" here is the application of some mutations by a TabletServer to the memory map and

Re: Kerberos Ticket Renewal (when not updating Hadoop user)

2019-06-14 Thread Josh Elser
. every call to next() on a scanner, or would wrapping the connector call to create it be enough? Thanks, Emilio On 6/13/19 4:09 PM, Josh Elser wrote: Yes. Anything in which you're interacting with Accumulo (read-as, anything that's going to execute an RPC to talk to the Master

Re: Kerberos Ticket Renewal (when not updating Hadoop user)

2019-06-13 Thread Josh Elser
... James On Thu, 13 Jun 2019 at 20:21, Josh Elser wrote: Hi James, A thread calling checkTGTAndReloginFromKeytab() is still what you want for renewals. Just make sure you wrap that in a UGI.doAs() for the user whose ticket you want to renew. In general, you just want to wrap any Accumulo-rel

Re: Kerberos Ticket Renewal (when not updating Hadoop user)

2019-06-13 Thread Josh Elser
Hi James, A thread calling checkTGTAndReloginFromKeytab() is still what you want for renewals. Just make sure you wrap that in a UGI.doAs() for the user whose ticket you want to renew. In general, you just want to wrap any Accumulo-related calls in a doAs() to avoid fallback onto the

Re: Multiple get user authorization requests during a table scan

2019-06-12 Thread Josh Elser
Sky, Yes, every request to a Tablet needs to be authorized. The rabbit hole (complexity) goes a bit further which, I believe, could result in this value being even higher than this. This in mind: what is your end goal? What are you trying to figure out? Authorizations are cached by Accumulo

NoSQL Day on May 21st

2019-05-09 Thread Josh Elser
Assuming most of you are local to the Washington D.C. area, NoSQL day is fast approaching. If you've not already signed up, please check out the agenda and consider joining us for a fun and technical day with lots of talks from Apache committers and big names in industry:

2 weeks remaining for NoSQL Day abstract submission

2019-04-04 Thread Josh Elser
There are just *two weeks* remaining to submit abstracts for NoSQL Day 2019, in Washington D.C. on May 21st. Abstracts are due April 19th. https://dataworkssummit.com/nosql-day-2019/ Abstracts don't need to be more than a paragraph or two. Please the time sooner than later to submit your

Re: Accumulo/Zookeeper Kerberos Integration

2019-03-27 Thread Josh Elser
By default, ZooKeeper allows fallback to read-only operations for clients, even when you have the SASL authentication provider set up. This is ultimately what lets Accumulo work. Like Christopher says, the work to use SASL ACLs instead of DIGEST ACLs for ZK is waiting for someone to pick it

Re: Accumulo on Google Cloud Storage

2019-01-17 Thread Josh Elser
Thanks for sharing, Maxim. What kind of failure/recovery testing did you do as a part of this? If you haven't done any yet, are you planning to do some such testing? - Josh On 1/15/19 10:02 AM, Maxim Kolchin wrote: Hi, I just wanted to leave intermediate feedback on the topic. So far,

Re: Dynamic set of visibility labels / Trusted query

2018-09-11 Thread Josh Elser
are pretty much static, but what of use cases like these? Would a filtering iterator be a better match? Thnx, Rob On Mon, Sep 10, 2018 at 6:04 PM, Josh Elser <mailto:els...@apache.org>> wrote: I think you know this already, but I'm not 100% sure based on your message: trying

Re: Dynamic set of visibility labels / Trusted query

2018-09-10 Thread Josh Elser
I think you know this already, but I'm not 100% sure based on your message: trying to change the labels on the data is a bad idea. If you need to handle a case where a label means one thing on day1 and another thing on day2, you would need to build the logic to handle that. The only other

Re: Accumulo performance on various hardware configurations

2018-08-29 Thread Josh Elser
To answer your original question: YCSB is a standard benchmarking tool for databases that provides various types of read/write workloads. https://github.com/brianfrankcooper/YCSB/tree/master/accumulo1.7 On 8/29/18 8:04 AM, guy sharon wrote: hi, Continuing my performance benchmarks, I'm still

Re: Accumulo performance on various hardware configurations

2018-08-29 Thread Josh Elser
Does Muchos actually change the Accumulo configuration when you are changing the underlying hardware? On 8/29/18 8:04 AM, guy sharon wrote: hi, Continuing my performance benchmarks, I'm still trying to figure out if the results I'm getting are reasonable and why throwing more hardware at

Re: can't turn tracing on

2018-08-23 Thread Josh Elser
but all my custom ones are not written to the trace table. My user is root, table is trace and I have Table.READ, Table.WRITE on the trace table. Any ideas on what could be causing this? On Thu, Aug 23, 2018 at 3:25 PM, Josh Elser <mailto:els...@apache.org>> wrote: Ensure that

Re: can't turn tracing on

2018-08-23 Thread Josh Elser
Ensure that the Tracer service is running on the node(s) specified in ${ACCUMULO_CONF_DIR}/tracers Likely, this process died because it didn't have permission to create and write to the trace table. To fix that, you need to grant permissions to the trace user to create tables or create the

Re: unsubscribe

2018-08-14 Thread Josh Elser
Click the big red button that says "unsubscribe" http://accumulo.apache.org/contact-us/ On 8/13/18 7:23 PM, sk_ac...@yahoo.com wrote: On Monday, August 13, 2018, 2:10:46 PM EDT, Abraham Raher wrote: ...

Re: Accumulo init.d script

2018-08-08 Thread Josh Elser
Every Accumulo service creates log files in the directory you specified via the ACCUMULO_LOG_DIR environment variable in accumulo-env.sh If you didn't define this, it likely defaults to ACCUMULO_HOME/logs. Have you looked at your syslog or similar to understand what your init.d script's

Re: Connector user switches between threads!

2018-07-05 Thread Josh Elser
Please read the Javadoc for the KerberosToken constructor as it should explain what's happening (in addition with Christopher's previous comment). On 7/3/18 11:50 PM, Mohammad Kargar wrote: Thanks for the insight. As of now I don't have any insight about the required code changes :( I'll Try

Re: Large number of used ports from tserver

2018-01-24 Thread Josh Elser
+1 to looking at the remote end of the socket and see where they're going/coming to/from. I've seen a few HDFS JIRA issues filed about sockets left in CLOSED_WAIT. Lucky you, this is a fun Linux rabbit hole to go down :)

Re: Problems with accumulo replication

2018-01-02 Thread Josh Elser
the peer as the only IP as the ZooKeeper Quorum value. 2017-12-29 16:07 GMT+01:00 Josh Elser <els...@apache.org>: If the system is reporting files that need to be replicated, it's probably one of two problems: * The WALs are still in use by the TabletServers. In its current imp

Re: Problems with accumulo replication

2017-12-29 Thread Josh Elser
If the system is reporting files that need to be replicated, it's probably one of two problems: * The WALs are still in use by the TabletServers. In its current implementation, the WALs are not replicated until the TabletServers don't referenced those WALs. This happens either by writing

Re: Triggers or their equivalent

2017-12-20 Thread Josh Elser
Accumulo doesn't provide any mechanism to build a traditional trigger like a traditional RDBMS or HBase. Can you share more details about what exactly you want this trigger to do? It's possible we can suggest an alternate approach which would fit naturally. If your trigger is causing other

Re: Connecting java client to Accumulo VM

2017-11-09 Thread Josh Elser
ears again.   My guess is the tablet server only starts when it's port is localhost. Am I using Accumulo correctly?  Is it not designed to be accessed remotely? On Wed, Nov 8, 2017 at 2:20 PM, Josh Elser

Re: Connecting java client to Accumulo VM

2017-11-08 Thread Josh Elser
Accumulo chooses the network interface to bind given the resolution of the hostname that you provide in the "hosts" files in ACCUMULO_CONF_DIR. If you have "localhost" (the default) still in the files (e.g. masters, slaves), this presumably resolves to 127.0.0.1 which will result in Accumulo

Re: Accumulo as a Column Storage

2017-10-19 Thread Josh Elser
Yup, that's the intended use case. You have the flexibility to determine what column families make sense to group together. Your only "cost" in changing your mind is the speed at which you can re-compact your data. There is one concern which comes to mind. Though making many locality groups

Re: Backup and Recovery

2017-10-03 Thread Josh Elser
The s3a Hadoop FileSystem isn't robust enough to support the requirements Accumulo has to guarantee no data loss around Write-Ahead Logs. You can use the ExportSnapshot tool for Accumulo to get an immutable "picture" of a table. The expectation is that you would use DistCp to copy the files

Re: Modifying VisibilityEvaluator - Problem with Classpath for scanner

2017-08-01 Thread Josh Elser
Accumulo will not load the jars you placed in lib/ or lib/ext via java.class.path. All you need to do is place the jars in one of those directories (and restart Accumulo if in lib/). The default general.classpaths regex in accumulo-site.xml will load all jar files that are in that directory.

Re: Modifying VisibilityEvaluator - Problem with Classpath for scanner

2017-07-31 Thread Josh Elser
Want to share the log files if you're worried that you're missing something? I can assure you that Accumulo is not "suppressing real Exceptions" and giving you "fake" ones. Also, yes. Accumulo is horizontally scalable. You can run one to many TabletServers. On 7/31/17 10:54 AM, o haya

Re: Need to change location of dataDir in zoo.cfg - but Accumulo doesn't start

2017-07-30 Thread Josh Elser
Strange that it would take so long to log in. This is essentially just an RPC or two. You should be able to open the shell within a few seconds against an Accumulo instance that is up and running. To confirm, yes, reinitializing Accumulo is the steps you would need to take to get Accumulo back up

Re: Missing replication metadata

2017-07-24 Thread Josh Elser
that goes. --Adam On Mon, Jul 24, 2017 at 1:55 PM, Josh Elser <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote: On 7/24/17 1:44 PM, Adam J. Shook wrote: We had some corrupt WAL blocks on our stage environment the other day and opted to delete them. 

Re: Missing replication metadata

2017-07-24 Thread Josh Elser
On 7/24/17 1:44 PM, Adam J. Shook wrote: We had some corrupt WAL blocks on our stage environment the other day and opted to delete them.  We not have some missing metadata and about 3k files pending for replication.  I've dug into it a bit and noticed that many of the WALs in the `order`

Re: Kerberos ticket renewal

2017-07-19 Thread Josh Elser
rt of a simple command line application - this seems to have no problem running for > 10 hours (even before I added the periodic renewal code) Will add extra logging to #2 and try to shorten the expiry from 10 hours to 1 so I can see any difference in output. James On 13 July 2017 at 16:05, Josh

Re: Kerberos ticket renewal

2017-07-13 Thread Josh Elser
pplication - this seems to have no problem running for > 10 hours (even before I added the periodic renewal code) Will add extra logging to #2 and try to shorten the expiry from 10 hours to 1 so I can see any difference in output. James On 13 July 2017 at 16:05, Josh Elser <els...@apache.org&

Re: Kerberos ticket renewal

2017-07-13 Thread Josh Elser
t cache, rather than keytab. Currently working on shortening the 10 hour expiry time so I can catch it in a debugger! Thanks, James On 13 July 2017 at 15:20, Josh Elser <els...@apache.org> wrote: If you're using Hortonworks' HDP, you would probably benefit from https://github.com/hortonworks

Re: Kerberos ticket renewal

2017-07-13 Thread Josh Elser
thrown I see a log entry "Performing ticket-cache-based Kerberos re-login". However, it should be using a keytab - have turned up the logging to 11 and will leave running overnight... James On 11 July 2017 at 16:17, Josh Elser <josh.el...@gmail.com> wrote: Nope, you've got it exac

Re: maximize usage of cluster resources during ingestion

2017-07-12 Thread Josh Elser
You probably want to split the table further than just 4 tablets per tablet server. Try 10's of tablets per server. Also, merging the content from (who I assume is) your coworker on this stackoverflow post[1], I don't believe the suggestion[2] to verify WAL max size, minc threshold, and

Re: Kerberos ticket renewal

2017-07-11 Thread Josh Elser
r/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121 Or is the TGT associated with an Accumulo KerberosToken separate? Thanks, James On 11 July 2017 at 02:59, Josh Elser <josh.el...@gmail.com> wrote: No, you are (likely) not running into ACCUMULO-4069. What

Re: New blog post

2017-07-10 Thread Josh Elser
+1, great stuff, Mike! Thanks for calling it out. On Fri, Jun 30, 2017 at 12:26 PM, Mike Walch wrote: > I wrote a blog post about improvements that I made to the Accumulo > documentation for 2.0: > >

Re: Kerberos ticket renewal

2017-07-10 Thread Josh Elser
No, you are (likely) not running into ACCUMULO-4069. What you've described sounds like your client's ticket expired. Accumulo does not spawn any ticket renewal on the behalf of clients. Hadoop's UGI code will automatically spawn a renewal thread when you log in using a ticket cache. This does not

Re: 'logs/' dir in the accumulo package

2017-06-26 Thread Josh Elser
thub.com/apache/accumulo/blob/rel/1.8.1/assemble/src/main/assemblies/component.xml#L91>) so I would expect the directory to continue to be there. Is that causing you issues? On Mon, Jun 26, 2017 at 4:46 PM Josh Elser <els...@apache.org <mailto:els...@apache.org>

Re: 'logs/' dir in the accumulo package

2017-06-26 Thread Josh Elser
Srikanth, I just checked 1.7.3 and 1.8.0, both of which have the logs/ directory included in the bin-tarball. It isn't a change in packaging -- it's been like this for some time. I don't expect that we would provide any guarantees about the presence of this directory (but I don't know why

Re: ClientConfiguration using Kerberos & MapReduce

2017-06-08 Thread Josh Elser
On 6/8/17 4:10 PM, James Srinivasan wrote: [snip] https://github.com/apache/accumulo/blob/f81a8ec7410e789d11941351d5899b8894c6a322/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L485-L500 This pulls the "DelegationTokenStub" out of the InputFormat

Re: ClientConfiguration using Kerberos & MapReduce

2017-06-07 Thread Josh Elser
On 6/7/17 3:54 PM, James Srinivasan wrote: [snip] Fortunately I found this: https://github.com/apache/hive/blob/master/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java Is it a good example of Accumulo + MapReduce that I can copy? That one is

Re: ClientConfiguration using Kerberos & MapReduce

2017-05-28 Thread Josh Elser
On 5/28/17 12:13 PM, James Srinivasan wrote: [snip] I can't call AccumuloInputFormat.setConnectorInfo again since it has already been called, and I presume adding the serialised token to the Configuration would be insecure? Yeah, the configuration can't protect sensitive information.

Re: Master takes awhile to start after Accumulo start-all.sh run

2017-05-26 Thread Josh Elser
On 5/25/17 10:03 PM, o haya wrote: Hi, I followed the procedure on this page to stand up my test Accumulo 1.8.1 instance: https://www.digitalocean.com/community/tutorials/how-to-install-the-big-data-friendly-apache-accumulo-nosql-database-on-ubuntu-14-04 Everything seems to be working

Re: ClientConfiguration using Kerberos & MapReduce

2017-05-20 Thread Josh Elser
James Srinivasan wrote: Delegation tokens are serialized into the Job's "credentials" section and distributed securely that way. Ah, that's my problem. Will probably have to update the GeoMesa code to wok with Jobs rather than Configurations, so that the Credentials aren't lost. Hmm, not so

Re: Trying to get SimpleIngestClient example working with Accumulo

2017-05-20 Thread Josh Elser
Have you considered using a tool like Apache Maven to handle dependency management for you? It's pretty uncommon these days to manually compile your code. Another option is using ant+ivy. If you are repackaging JARs, you may have run into a case where the jars are sealed (we've started doing

Re: ClientConfiguration using Kerberos & MapReduce

2017-05-19 Thread Josh Elser
James Srinivasan wrote: However, I seem to get this when trying to use the DelegationToken: scala> rdd.count() 17/05/19 21:30:55 INFO UserGroupInformation: Login successful for user accumulo-wink@VBOX.LOCAL using keytab file /tmp/accumulo.headless.keytab java.lang.NullPointerException at

Re: ClientConfiguration using Kerberos & MapReduce

2017-05-18 Thread Josh Elser
James Srinivasan wrote: [snip] ConfiguratorBase has no other overrides for setZookeeperInstance, so I don't see how this would ever work with Kerberos. It is marked as deprecated, which points me to AccumuloInputFormat, but I'm a little confused as to how this API relates to ConfiguratorBase.

Re: Tablet Server throwed HDFS replication error(Accumulo 1.7.2)

2017-05-17 Thread Josh Elser
Hi Takashi, Accumulo TabletServers, by default, create WALs with a size of ~1GB (think, pre-allocate the file). The error you show often comes because a Datanode cannot actually allocate that much space given its reserved space threshold. See dfs.datanode.du.reserved in hdfs-site.xml To

Re: Configure NetBeans to build projet using Accumulo

2017-05-09 Thread Josh Elser
Giuseppe, Please use the mailing list for questions about Apache Accumulo and refrain from emailing participants directly. I do not use Netbeans, but I'm sure someone else does. Hopefully they can help you. - Josh giusur...@gmail.com wrote: Dear Sir. Josh, I'm a computer student in the

Re: Repeating connections with SASL

2017-04-17 Thread Josh Elser
I'm kind of surprised as to how you're getting the NoClassDefFoundError. I don't think that's at all related to disabling sasl client auth (as Dave suggested). Dave Marion wrote: Add the following JVM argument to the server processes: -Dzookeeper.sasl.client=false On April 17, 2017 at

Re: Accumulo on Azure / WebHDFS

2017-04-17 Thread Josh Elser
but that's it. James Hughes wrote: Hi Josh, Thanks again! As a follow-up, is any of the information about Accumulo on WASB or ADL public? I suppose I'm curious about configuration (is it just plug-and-play?) and performance. Thanks in advance, Jim On Sat, Apr 15, 2017 at 2:25 PM, Josh Else

Re: Accumulo on Azure / WebHDFS

2017-04-15 Thread Josh Elser
WebHDFS handled or worked around limitations from S3, etc. Cheers, Jim On Fri, Apr 14, 2017 at 12:47 PM, Josh Elser <josh.el...@gmail.com> wrote: > Hi Jim, > > I can say that Accumulo will work on Azure's blob store and their data > lake store. These are a result of testing I'm invo

Re: Accumulo on Azure / WebHDFS

2017-04-14 Thread Josh Elser
Hi Jim, I can say that Accumulo will work on Azure's blob store and their data lake store. These are a result of testing I'm involved with at Hortonworks (dayjob). I know that these filesystems are tested to an appropriate degree, proving that they do provide the things that Accumulo needs. As a

Re: setauth configuration

2017-04-13 Thread Josh Elser
No, you need to have Accumulo running in order to set the authorizations for a user. Yes, user authorizations are persisted across restarts of Accumulo. lo...@aol.com wrote: Is it possible to configure setauth without having to call ‘accumulo shell setauth’ against a running accumulo? And

Re: New accumulo instance with existed HDFS cluster

2017-04-06 Thread Josh Elser
No worries! Email makes it hard as well :) Lee wrote: Hi, Josh: Sorry for my poor English. It's just the same case you assumed -- I will reinstall the Accumulo. Your reply really solved my problem. I will try it later. Thanks At 2017-04-05 22:50:28, "Josh Elser"<josh.el

Re: New accumulo instance with existed HDFS cluster

2017-04-05 Thread Josh Elser
Lee -- it sounds like you want to take the data from an old Accumulo cluster, install Accumulo on a new set of nodes, and then start the new instance of Accumulo against the old installation? If that's the case, this should be very painless. Just make sure that the old instance is completely shut

Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

2017-02-22 Thread Josh Elser
so look at the rfiles that compose that tablet to see if anything sticks out. Any logs that would help explain why the tablet server is dying? Can you increase the memory of the tserver? Mike On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <josh.el...@gmail.com

Re: Qonduit - secure web socket proxy for Accumulo

2017-02-22 Thread Josh Elser
, 2017 at 1:55 PM Josh Elser<josh.el...@gmail.com> wrote: Neat. Thanks for sharing! Any examples to show how a client would use it? Regarding the security, does it encompass authentication and privacy (encryption)? Any experience with certain implementations for the Spring security module

Re: Qonduit - secure web socket proxy for Accumulo

2017-02-22 Thread Josh Elser
Neat. Thanks for sharing! Any examples to show how a client would use it? Regarding the security, does it encompass authentication and privacy (encryption)? Any experience with certain implementations for the Spring security modules (e.g. which ones you've tested to work)? Dave Marion

Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

2017-02-21 Thread Josh Elser
e tablet server is dying? Can you increase the memory of the tserver? Mike On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote: ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception comm

Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

2017-02-21 Thread Josh Elser
its the shell. >> >> >> -Original Message- >> From: Dickson, Matt MR [mailto:matt.dick...@defence.gov.au] >> Sent: Tuesday, 21 February 2017 09:30 >> To: 'user@accumulo.apache.org' >> Subject: RE: accumulo.root invalid table reference [SEC=UNOFFIC

Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

2017-02-20 Thread Josh Elser
The root table should only reference the tablets in the metadata table. It's a hierarchy: like metadata is for the user tables, root is for the metadata table. What version are ya running, Matt? Dickson, Matt MR wrote: *UNOFFICIAL* I have a situation where all tablet servers are

Re: Status record lacked createdTime

2017-02-17 Thread Josh Elser
Hey Adam, Thanks for sharing this one. Adam J. Shook wrote: Hello folks, One of our clusters has been throwing a handful of replication errors from the status maker -- see below. The WAL files in question to not belong to an active tserver -- some investigation in the code shows that the

Re: Improving Accumulo Replication Latency

2017-02-15 Thread Josh Elser
buy lunch! Cheers, --Adam On Wed, Feb 15, 2017 at 2:52 PM, Josh Elser <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote: Hi Adam, I'm not presently working on anything (too many irons in other fires), but I'd be happy to help work through a design doc for

Re: Improving Accumulo Replication Latency

2017-02-15 Thread Josh Elser
Hi Adam, I'm not presently working on anything (too many irons in other fires), but I'd be happy to help work through a design doc for improvements. Do you have a list of pain-points which are the primary causes of latency? That would help in identifying the changes to make and how best to

Re: data miss when use rowiterator

2017-02-09 Thread Josh Elser
Just to be clear, Lu, for now stick to using a Scanner with the RowIterator :) It sounds like we might have to re-think how the RowIterator works with the BatchScanner... Christopher wrote: I suspected that was the case. BatchScanner does not guarantee ordering of entries, which is needed

Re: how to make LZO compression work?

2017-02-07 Thread Josh Elser
If you're amenable to it, Snappy should have similar performance without the need for the native library :) Massimilian Mattetti wrote: Hi all, I got stuck trying to enable the LZO compression on a table. I installed the native-lzo library on each tablet server (sudo apt-get install

Re: accumulo enhancement suggestions

2017-01-28 Thread Josh Elser
Please use the user@accumulo.apache.org mailing list in the future for these kinds of questions/request. It is not in good taste to send private emails to those involved in any Apache projects. -- The functionality you're suggesting already exists for all versions you mentioned.

Re: Tablet count per table [SEC=UNOFFICIAL]

2017-01-23 Thread Josh Elser
I believe this would be the number of unique rows in the Accumulo metadata table (that don't fall in the "~del" prefix). Not sure if we have a class you could just invoke, but this would be pretty easy to script just using the Accumulo shell. Dickson, Matt MR wrote: *UNOFFICIAL* Is there a

Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]

2017-01-16 Thread Josh Elser
Just to clarify: compactions don't implicitly change the table "distribution". The number and/or boundaries of tablets don't change as a part of a compaction. Yamini Joshi wrote: Just a thought, will forcing a major compaction take care of this? Merging smaller tablets and deleting empty

Re: Replication Latency

2017-01-11 Thread Josh Elser
, but replication is taking about 12 to 15 min to complete. Even though the wal is not being written to after 3m I am not seeing it ready for replication (closed: true) until after 13m. On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wro

Re: Replication Latency

2017-01-11 Thread Josh Elser
See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences for where WALs are currently marked as "closed". I don't recall the details, but I think there was some issue with trying to close them in TabletServerLogger. Yes to your last question: if it were done in TabletServerLogger,

Re: Unsubscribe

2017-01-11 Thread Josh Elser
See http://accumulo.apache.org/mailing_list/ You unsubscribe yourself just like you subscribed yourself. Paul Tremblett wrote:

Re: is there any "trick" to save the state of an iterator?

2017-01-09 Thread Josh Elser
> Dylan can explain (if necessary). > Regards. -Jeremy > > On Mon, Jan 09, 2017 at 07:30:03PM -0500, Josh Elser wrote: > > Great. Glad I wasn't derailing things :) > > > > Unfortunately, I don't think this is a very well-documented area of the > > code

Re: is there any "trick" to save the state of an iterator?

2017-01-09 Thread Josh Elser
no fancy way to overcome > this problem. > > Is there any good documentation on different query planning for Accumulo > that could help with my use case? > Thanks. > > Regards, > Max > > > > > From: Josh Elser <josh.el...@gmail.com> > To: user@accumu

Re: is there any "trick" to save the state of an iterator?

2017-01-09 Thread Josh Elser
Hey Max, There is no provided mechanism to do this, and this is a problem with supporting "range queries". I'm hoping I'm understanding your use-case correctly; sorry in advance if I'm going off on a tangent. When performing the standard sort-merge join across some columns to implement

Re: Teardown and deepCopy

2017-01-04 Thread Josh Elser
I would suggest that your approach is flawed from the start. Consider the following case: You read through the first half of a tablet and have collected a set of 1000 IDs which you have seen. When you try to read the second half of the tablet, the TabletServer dies from an OOME. The Tablet is

Re: Importtable command not working from Java API

2016-12-15 Thread Josh Elser
I would double-check your spelling, extra spaces, or generally malformed input data. https://github.com/apache/accumulo/blob/master/shell/src/main/java/org/apache/accumulo/shell/commands/ImportTableCommand.java#L33 You can see that the importtable shell command makes the same exact invocation

Re: HDFS vs Accumulo Performance

2016-12-05 Thread Josh Elser
If you're only ever doing sequential scans, IMO, it's expected that HDFS would be faster. Remember that, architecturally, Accumulo is designed for *random-read/write* workloads. This is where it would shine in comparison to HDFS. Accumulo is always going to have a hit in sequential read/write

Re: Master server throw AccessControlException

2016-12-05 Thread Josh Elser
Apache mailing lists strip attachments. Please host the files somewhere and provide a link to them. On Dec 4, 2016 20:54, "Takashi Sasaki" wrote: > Hello, > > I'm sorry to take a few wrong infomation at first post. > > I asked the project members again about the problem. >

Re: copying Accumulo tables to another cluster

2016-12-02 Thread Josh Elser
It will not work to copy the raw HDFS directory. Please use the importtable and exporttable Accumulo shell commands. http://accumulo.apache.org/1.8/examples/export Jayesh Patel wrote: I’m hoping to copy the data from my Accumulo cluster to a new cluster using the Hadoop distcp command and run

Re: Accumulo Working

2016-11-22 Thread Josh Elser
-> it2 -> it3 -> client (if the max limit is reached) or is it always at the the end of the pipeline? Best regards, Yamini Joshi On Tue, Nov 22, 2016 at 12:36 PM, Josh Elser <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote: Scanners are sequentially comm

Re: how to reduce re-seeking rate?

2016-11-22 Thread Josh Elser
There isn't any funny classloading happening in the normal case, so having the log4j2.xml file in your jar should be sufficient. Caveat is if you're using the HDFS classloading stuff, but that's something you would have enabled by hand if you're using it. I think the scan max memory that Dave

Re: Accumulo Working

2016-11-22 Thread Josh Elser
Scanners are sequentially communicating with TabletServers, as opposed to BatchScanners which do this communication in parallel. Scanners aren't so much "merging" data, but requesting it in sorted order from the appropriate TabletServer. All Iterators are applied to some batch of results from

Re: upgrade from 1.7 to 1.8 faild

2016-11-22 Thread Josh Elser
Hi Lu, What version of 1.7 did you upgrade from? 1.7.0? 1.7.1? Can you please share the exact steps you took to upgrade? Can you share you accumulo-site.xml, please? Thanks. Lu Q wrote: I have a 1.7 accumulo ,and now I upgrade it to 1.8. I run the start-here.sh,it looks well. ``` Starting

Re: Detecting database changes

2016-11-22 Thread Josh Elser
+1 Christopher wrote: Apache Fluo can do this with Accumulo: https://fluo.apache.org On Tue, Nov 22, 2016, 07:26 vaibhav thapliyal > wrote: Hi, I have a use case where I need to send out notifications based on

Re: Replication Work Assigner Per Table?

2016-11-21 Thread Josh Elser
with the process. On Mon, Nov 21, 2016 at 11:22 AM, Josh Elser <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote: Aha, want to open a patch to fix the docs? Noe Detore wrote: https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/rep

Re: Clear Cache

2016-11-19 Thread Josh Elser
Hi Yamini, I'd just add one word of caution about knowing exactly what you're trying to measure. For example, running benchmarks over a cold system is not going to be representative of real cluster performance. To prevent caching from affecting time, you would need to do the following: *

Re: Replication Work Assigner Per Table?

2016-11-18 Thread Josh Elser
What made you think that this is allowed to be a per-table property? If the docs are unclear, we should fix that. It is a global-only property. Noe Detore wrote: I noticed in replication.txt implied replication.work.assigner can be set per table table.root@accumulo_primary> config -t

Re: VFS class reloading

2016-11-17 Thread Josh Elser
Probably related to the race condition as described in https://issues.apache.org/jira/browse/VFS-487 You can update the dependency to 2.1 by hand in $ACCUMULO_HOME/lib, or use the fixed out of the box version in 1.7.2 or 1.8.0 via https://issues.apache.org/jira/browse/ACCUMULO-3470 Wyatt

Re: Replication Latency and Metrics

2016-11-16 Thread Josh Elser
No, Accumulo tracks replication at the (write-ahead log) file level, not at the update level. For the names you've listed, you should also see values for each. I am assuming that you did not copy all of the exposed metrics. I believe the NumOps and AvgTime values are information generated by

Re: Best Practices and Deployment Guides?

2016-11-15 Thread Josh Elser
Hi Larry, The O'Reilly book has been out for some time now: http://shop.oreilly.com/product/0636920032304.do And, if you weren't aware, the Admin guide because the Users Manual. Can you explain what kind of information you're specifically looking for? It's generally hard for Accumulo to

Re: Running low on memory and Zookeeper Session expired / disconnected

2016-11-11 Thread Josh Elser
Hi Mario, First off, look for any JVM GC pause warnings in the TabletServer logs. There's a callback which should fire at a given interval and will warn when it does not. Typically, this is because of a stop-the-world GC cycle. These cycles also block the ZK heartbeat action as Dylan

Re: HDFS Replication of data

2016-11-10 Thread Josh Elser
Likely, there isn't going to be a positive impact to read performance with an increased number of replicas (unless the number of replicas approaches the number of datanodes, which is infeasible except for very, very small instances). Given Accumulo's lax policy of Tablet placement WRT HDFS

Re: Using Flink with Accumulo

2016-11-07 Thread Josh Elser
Oliver Swoboda wrote: Hi Josh, thank you for your quick answer! 2016-11-03 17:03 GMT+01:00 Josh Elser <els...@apache.org <mailto:els...@apache.org>>: Hi Oliver, Cool stuff. I wish I knew more about Flink to make some better suggestions. Some points inline, and sorr

  1   2   3   4   5   6   7   >