I have been doing a lot of thinking on how to make this piece more push button (like the rest of Blur) and I think I have come up with a possible solution (with Aaron's help) and wanted to run it by everyone before I get too far down the path.
Server setup: I think that we should have a simple Jetty server that servers up the initial pages and contains an embeddable db for a very small amount of application data (will get to what this is in just a second). The start up of the server can be added to the blur start-all if we wanted so that it is available as soon as blur starts without extra work. Features: I want to run through existing features and explain how I think they should work in this version. For the most part I am leaning towards having ~95% of the application be in javascript running in the user's browser itself. Dashboard: This page currently displays node status information (overall zookeepers, controllers, and shards) for multiple instances of blur. This page also displays some high level HDFS information but depending on what Hue provides we might just remove this part. I think we can build in a very similar approach that we have now where a small piece of code in the Jetty server records node status into the embeddable database. The browser would then poll for updates to display. Environment: This page give detailed information on node status for a specific blur instance. This would use the exact same information as the Dashboard, just would display differently for the specific instance. Tables: This page lists the tables that are in a specific version of blur. Most of this information today comes from the Blur API itself and now that Aaron is generating the thrift api in JS we can just ask for the information on demand from the browser. Need to make sure there are no performance implications on Blur itself if we are making the call on demand as opposed to polling. This page also displays some shard server layout and schema information which can all be obtained through the API. Queries: This page gives information about recent queries on a specific instance that have gone through Blur. This is all achieved through the Blur API so the information can be obtained on demand through the browser the same way. The only issue that I'm not sure yet how to handle is that Blur currently keeps queries around for 2 minutes and the current agent was keeping 30-60 minutes worth of queries to help troubleshoot some things in the running system. Maybe use a hybrid approach with the Jetty server. Search: This page allows for searching a specific instance of Blur. I think that all of this can be done through the JS API, we will just have to do something about the user preference to choose a priority column family and maybe saved searches (though I'm thinking we can use browser local storage do accomplish this) HDFS: This section contained HDFS stats and metrics as well as a file browser. I think we can remove this and defer to tools like Hue. Audit: This section displays an audit of destructive actions performed through this tool (disabled tables, deleting tables, forgetting nodes, etc). This has been useful in my experience when a table disappears and we didn't know who did it. Now this doesn't audit the fact that the action was taken through the shell or other external tools. I would like feedback on what others feel about the usefulness of this. Admin: This allows for controlling users and their roles in the system. We had originally done this so that 1. this tool isn't available to just anyone (will contain production data), 2. restrict destructive actions, 3. Allows for control over who can see certain information (i.e. actual query content). We can probably utilize the Jetty server to at least do a basic user setup, though once someone has the Blur api in JS they could run it themselves. I know this is a lot of information, but I think we need to make this tool easier and faster for everyone to use. Please let me know what you think. Thanks, Chris On Fri, Sep 13, 2013 at 5:50 PM, Chris Rohr <[email protected]> wrote: > I'm not familiar with it, but does Hue do HDFS capacity and node status? > I could definitely see taking out the file browser part. > > > On Fri, Sep 13, 2013 at 8:31 AM, Garrett Barton > <[email protected]>wrote: > >> I think hue covers that functionality pretty well. The only slightly odd >> part is afaik hue doesn't talk to multi clusters and blur is typically on >> a >> separate cluster. >> On Sep 13, 2013 4:26 AM, "Aaron McCurry" <[email protected]> wrote: >> >> > At this point I would vote to remove it. If we take HBase as an >> example, I >> > don't think that they provide a way to interact with the FileSystem >> read or >> > write through any kind of web interface. >> > >> > Aaron >> > >> > >> > On Thu, Sep 12, 2013 at 4:29 PM, Chris Rohr <[email protected]> >> wrote: >> > >> > > Hi all, >> > > >> > > As we look to take the console to the next level, I was wondering what >> > > everyone's thoughts are the usefulness of the HDFS portion of the >> > console. >> > > This portion has the following features: >> > > >> > > 1. File system browser >> > > a. Viewer >> > > b. Upload >> > > c. Rename >> > > d. Delete >> > > 2. Node status >> > > 3. Capacity status >> > > >> > > I am just wondering if this should be here or if we should just rip it >> > out >> > > and let people use something like Hue for this information. >> > > >> > > Chris >> > > >> > >> > >
