Re: [GSoC 2015][COMDEV-119] Zeppelin GSoC Project: add more D3 visualization

madhuka udantha Wed, 25 Mar 2015 10:40:50 -0700

Hi,

According to the discussion to get a clear understanding I just drew 2
sequence diagrams that  will explain
how chart will react to changing the pivot.


Safe Level
https://issues.apache.org/jira/secure/attachment/12707251/Changing%20the%20pivot%20-%20Safe%20Level.png


In safe level (default level) only limited amount of data is
retrieved(sufficient to draw the chart).
At initial stage local storage don't contain data. But when you make a
pivot change data will be there to draw the graph. If data is out-dated we
will get it from back-end.

Restricted Level
https://issues.apache.org/jira/secure/attachment/12707250/Changing%20the%20pivot%20-%20Restricted%20Level.png
<https://issues.apache.org/jira/secure/attachment/12707250/Changing%20the%20pivot%20-%20Restricted%20Level.png>

User will reach Restricted Level after he successfully pass the Safe Level.
Then in local storage we will have up-to-date data. But for this level it
will be using all the data in the database. So Charting will grab the data
from storage and back-end.

Your ideas are mostly welcomed.


On Wed, Mar 25, 2015 at 2:55 PM, madhuka udantha <madhukaudan...@gmail.com>
wrote:

> Hi,
>
> I want to know about the code structure and zeppelin architecture? Is
> there any good post / article / wiki regarding the said.
> Also if there is any quick start guide regarding development of Zeppelin
> please share it with me.
>
> Thanks.
>
> On Mon, Mar 23, 2015 at 10:44 AM, madhuka udantha <
> madhukaudan...@gmail.com> wrote:
>
>> Hi, moon
>>
>> Yes, Since
>>
>>> "Moving computation is cheaper than moving data"
>>
>> We can do computation in computing framework.
>>
>> For simple pivot changing or filtering can be handle in local storage
>> with indexing databases depending on the current user level.
>> As you saw, computations will be handle in the back ends.
>>
>> Great to hear about the building rich GUI, I will give me chart library ideas
>> on there.
>>
>> Your ideas are always welcome, those will be helpful for my task and
>> draft proposal
>>
>> Thanks
>>
>> On Mon, Mar 23, 2015 at 7:59 AM, moon soo Lee <m...@apache.org> wrote:
>>
>>> Hi, madhuka udantha
>>>
>>> I think your idea about chart library and data transformation engine
>>> sounds
>>> cool. For the data transform modules, it's good idea to make this
>>> pluggable
>>> to data transform engine. But i'm not sure getting result locally and do
>>> transform for pivot or filtering to prevent run query again is good idea.
>>> Because of Zeppelin is (not limited but) trying to build analytical
>>> environment on top of distributed computing framework, like Spark, Flink,
>>> Ignite, etc. Most of distributed computing framework Zeppelin trying to
>>> integrate is following the same paradigm "Moving computation is cheaper
>>> than moving data". In this manner, size of data that transform engine
>>> need
>>> to handle can be easily multiple TB. Which will take long time to copy to
>>> local machine and process. So i think transform module should be run on
>>> underlying distributed computing framework.
>>>
>>> And about Chart library, we have started discussion thread about building
>>> rich GUI inside of notebook. it might be related.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>>
>>> On Mon, Mar 23, 2015 at 2:27 AM madhuka udantha <
>>> madhukaudan...@gmail.com>
>>> wrote:
>>>
>>> > On Sun, Mar 22, 2015 at 7:05 PM, Corneau Damien <cornead...@apache.org
>>> >
>>> > wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > Being able to aggregate on the query side is a great idea and would
>>> allow
>>> > > us to transfer less data as well as having a full query
>>> representation of
>>> > > the visualization.
>>> > >
>>> > > However creating a SQL query dynamically is a pretty difficult task,
>>> and
>>> > > might be too much for that scope.
>>> > >
>>> > > Also I see some possible problems with this method:
>>> > >  - Changing the pivot or simple filtering would mean running the
>>> query
>>> > > again
>>> > >
>>> > No, the query wont run again.
>>> > In the first run of the query data is collected and stored locally-
>>> local
>>> > storage [1](using indexing techniques to make retrieval faster) So
>>> changing
>>> > pivot or simple filtering will use the local storage.
>>> > If any attribute or data is missing in local storage then it will
>>> retrieve
>>> > only that and save the network bandwidth as well.
>>> > Does my explanation make sense.
>>> >
>>> >
>>> >
>>> > >  - Being able to make pivot style SQL query would be really hard,
>>> > >    we would need multiple sub-queries or even some times multiple
>>> queries
>>> > > (I tried a few times and could have the result wanted only with
>>> > > visualization side pivot).
>>> > >    It would end up with really bad SQL queries, especially with the
>>> Hive
>>> > > SQL or Spark SQL limitations and would take way more time to process.
>>> > >
>>> > Agreed. I'm not planing to use pivot style queries.
>>> >
>>> > Any suggestions?
>>> >
>>> >
>>> > Thanks.
>>> >
>>> >
>>> > > On Sun, Mar 22, 2015 at 10:08 PM, IT CTO <goi....@gmail.com> wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > The Chart library features sounds promising.
>>> > > > As  for the data engine - one thing that I think is missing is the
>>> > > ability
>>> > > > to use the visualization to drive the aggregation in the SQL.
>>> today,
>>> > you
>>> > > > first write the SQL, you execute it, *limited by the number of
>>> results
>>> > > sent
>>> > > > to the client*, and then you use viz to understand the results.
>>> > > > Alternatively, if through the visualization I can generate a
>>> better SQL
>>> > > > which returns returns an aggregated data-set then I can analyze a
>>> > bigger
>>> > > > amount of data.
>>> > > >
>>> > > > I hope I was clear enough in my explanation :-)
>>> > > >
>>> > > > Eran
>>> > > >
>>> > > >
>>> > > > On Fri, Mar 20, 2015 at 8:21 AM, madhuka udantha <
>>> > > madhukaudan...@gmail.com
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > Here is my proposing ideas.
>>> > > > > According to COMDEV-119 jira. Charts are hard coded until now and
>>> > data
>>> > > > > transformation issue was highlighted since different charts have
>>> > > > different
>>> > > > > pivot fields eg: Area charts, Scatter, Surface charts, Bubble
>>> charts,
>>> > > > Radar
>>> > > > > charts. etc..
>>> > > > >
>>> > > > > To solve this I am introducing a two major component one is
>>> called
>>> > > 'Chart
>>> > > > > library' and 'Data transformation engine'. Chart library is
>>> located
>>> > > where
>>> > > > > it shows the chats that are currently plugged. There we can plug
>>> > chart
>>> > > > > types and those can be reused.
>>> > > > >
>>> > > > > *Chart library features *
>>> > > > >
>>> > > > >    - Users can select the chart from library
>>> > > > >    - Those charts are pluggable to library
>>> > > > >    - Charts can be plugged by config(json)/UI with wizard
>>> > > > >    - Configuration/Meta file of the chart contains interface,
>>> libs,
>>> > > > themes
>>> > > > >    and a data transformation types/mappings
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > *Data Transformation Engine*
>>> > > > > 'Data transformation engine' contains data transformation
>>> modules.
>>> > > Those
>>> > > > > modules are also pluggable to engine. Those have connections to
>>> > charts.
>>> > > > > Data transformation engine sit between the data (sql) and chart.
>>> So
>>> > > this
>>> > > > > module  converts data and map them to each chart pivot field
>>> > > > >
>>> > > > >    - This module will look at pivot fields of the chart
>>> > > > >    - Selected attributes of the SQL query
>>> > > > >    - Attribute value operations improvement (string split, value
>>> > > > >    aggregation, round number round)
>>> > > > >
>>> > > > >
>>> > > > > Another improvement that I notice is that
>>> > > > >
>>> > > > >    - Query Edit auto-completion support (with Ctrl+space)
>>> > > > >
>>> > > > >
>>> > > > > Your ideas are welcome here
>>> > > > > Thanks
>>> > > > >
>>> > > > > On Fri, Mar 20, 2015 at 10:57 AM, madhuka udantha <
>>> > > > > madhukaudan...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi All,
>>> > > > > >
>>> > > > > > I'm Udantha, MSc. Student at University of Moratuwa. This GSoC
>>> 2015
>>> > > > > > project, 0COMDEV-1190 captures my interest.
>>> > > > > >
>>> > > > > > I have abundant experiences of visualization techniques
>>> creating
>>> > > > numerous
>>> > > > > > dashboards[1,2] with javascript, html5, angularJS, d3 charting
>>> etc.
>>> > > > > >
>>> > > > > > My current research area comprises of big data where I have
>>> worked
>>> > > with
>>> > > > > > various types of data sets. Also I'm working with cluster
>>> > > > representation
>>> > > > > > and classification techniques where visualization amounts to a
>>> > > > > considerable
>>> > > > > > part. I was following COMDEV-119 (jira) with Alexander
>>> Bezzubov and
>>> > > > > CORNEAU
>>> > > > > > Damien for more than week.
>>> > > > > >
>>> > > > > > Thanks
>>> > > > > >
>>> > > > > > [1] http://wso2.com/products/user-engagement-server/
>>> > > > > > [2] https://github.com/wso2/jaggery
>>> > > > > > --
>>> > > > > > Cheers,
>>> > > > > > Madhuka Udantha
>>> > > > > > http://madhukaudantha.blogspot.com
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > Cheers,
>>> > > > > Madhuka Udantha
>>> > > > > http://madhukaudantha.blogspot.com
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Eran | CTO
>>> > > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Madhuka Udantha
>>> > http://madhukaudantha.blogspot.com
>>> >
>>>
>>
>>
>>
>> --
>> Cheers,
>> Madhuka Udantha
>> http://madhukaudantha.blogspot.com
>>
>
>
>
> --
> Cheers,
> Madhuka Udantha
> http://madhukaudantha.blogspot.com
>



-- 
Cheers,
Madhuka Udantha
http://madhukaudantha.blogspot.com

Re: [GSoC 2015][COMDEV-119] Zeppelin GSoC Project: add more D3 visualization

Reply via email to