I have been doing a lot of thinking on how to make this piece more push
button (like the rest of Blur) and I think I have come up with a possible
solution (with Aaron's help) and wanted to run it by everyone before I get
too far down the path.

Server setup:

I think that we should have a simple Jetty server that servers up the
initial pages and contains an embeddable db for a very small amount of
application data (will get to what this is in just a second).  The start up
of the server can be added to the blur start-all if we wanted so that it is
available as soon as blur starts without extra work.

Features:

I want to run through existing features and explain how I think they should
work in this version.  For the most part I am leaning towards having ~95%
of the application be in javascript running in the user's browser itself.

Dashboard:

This page currently displays node status information (overall zookeepers,
controllers, and shards) for multiple instances of blur.  This page also
displays some high level HDFS information but depending on what Hue
provides we might just remove this part.  I think we can build in a very
similar approach that we have now where a small piece of code in the Jetty
server records node status into the embeddable database.  The browser would
then poll for updates to display.

Environment:

This page give detailed information on node status for a specific blur
instance.  This would use the exact same information as the Dashboard, just
would display differently for the specific instance.

Tables:

This page lists the tables that are in a specific version of blur.  Most of
this information today comes from the Blur API itself and now that Aaron is
generating the thrift api in JS we can just ask for the information on
demand from the browser.  Need to make sure there are no performance
implications on Blur itself if we are making the call on demand as opposed
to polling.  This page also displays some shard server layout and schema
information which can all be obtained through the API.

Queries:

This page gives information about recent queries on a specific instance
that have gone through Blur.  This is all achieved through the Blur API so
the information can be obtained on demand through the browser the same way.
 The only issue that I'm not sure yet how to handle is that Blur currently
keeps queries around for 2 minutes and the current agent was keeping 30-60
minutes worth of queries to help troubleshoot some things in the running
system.  Maybe use a hybrid approach with the Jetty server.

Search:

This page allows for searching a specific instance of Blur.  I think that
all of this can be done through the JS API, we will just have to do
something about the user preference to choose a priority column family and
maybe saved searches (though I'm thinking we can use browser local storage
do accomplish this)

HDFS:

This section contained HDFS stats and metrics as well as a file browser.  I
think we can remove this and defer to tools like Hue.

Audit:

This section displays an audit of destructive actions performed through
this tool (disabled tables, deleting tables, forgetting nodes, etc).  This
has been useful in my experience when a table disappears and we didn't know
who did it.  Now this doesn't audit the fact that the action was taken
through the shell or other external tools.  I would like feedback on what
others feel about the usefulness of this.

Admin:

This allows for controlling users and their roles in the system.  We had
originally done this so that 1. this tool isn't available to just anyone
(will contain production data), 2. restrict destructive actions, 3. Allows
for control over who can see certain information (i.e. actual query
content).  We can probably utilize the Jetty server to at least do a basic
user setup, though once someone has the Blur api in JS they could run it
themselves.

I know this is a lot of information, but I think we need to make this tool
easier and faster for everyone to use.  Please let me know what you think.

Thanks,
Chris


On Fri, Sep 13, 2013 at 5:50 PM, Chris Rohr <[email protected]> wrote:

> I'm not familiar with it, but does Hue do HDFS capacity and node status?
>  I could definitely see taking out the file browser part.
>
>
> On Fri, Sep 13, 2013 at 8:31 AM, Garrett Barton 
> <[email protected]>wrote:
>
>> I think hue covers that functionality pretty well.  The only slightly odd
>> part is afaik hue doesn't talk to multi clusters and blur is typically on
>> a
>> separate cluster.
>> On Sep 13, 2013 4:26 AM, "Aaron McCurry" <[email protected]> wrote:
>>
>> > At this point I would vote to remove it.  If we take HBase as an
>> example, I
>> > don't think that they provide a way to interact with the FileSystem
>> read or
>> > write through any kind of web interface.
>> >
>> > Aaron
>> >
>> >
>> > On Thu, Sep 12, 2013 at 4:29 PM, Chris Rohr <[email protected]>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > As we look to take the console to the next level, I was wondering what
>> > > everyone's thoughts are the usefulness of the HDFS portion of the
>> > console.
>> > > This portion has the following features:
>> > >
>> > > 1. File system browser
>> > >     a. Viewer
>> > >     b. Upload
>> > >     c. Rename
>> > >     d. Delete
>> > > 2. Node status
>> > > 3. Capacity status
>> > >
>> > > I am just wondering if this should be here or if we should just rip it
>> > out
>> > > and let people use something like Hue for this information.
>> > >
>> > > Chris
>> > >
>> >
>>
>
>

Reply via email to