Re: [GSoC 2015][COMDEV-119] Zeppelin GSoC Project: add more D3 visualization

madhuka udantha Thu, 26 Mar 2015 23:52:32 -0700

Hi Alex,

This is the email thread which I used to collaborate with the Zeppelin
Community[1] .
Currently I'm writing the Proposal upon the points discussed here and on
JIRA. If I have missed anything regard to the task please feel free to
share it.
I'll share my proposal when I finish writing it.


Thanks.


[1] https://issues.apache.org/jira/browse/COMDEV-119

On Wed, Mar 25, 2015 at 11:09 PM, madhuka udantha <madhukaudan...@gmail.com>
wrote:

> Hi,
>
> According to the discussion to get a clear understanding I just drew 2
> sequence diagrams that  will explain
> how chart will react to changing the pivot.
>
> Safe Level
>
> https://issues.apache.org/jira/secure/attachment/12707251/Changing%20the%20pivot%20-%20Safe%20Level.png
>
>
> In safe level (default level) only limited amount of data is
> retrieved(sufficient to draw the chart).
> At initial stage local storage don't contain data. But when you make a
> pivot change data will be there to draw the graph. If data is out-dated we
> will get it from back-end.
>
> Restricted Level
>
> https://issues.apache.org/jira/secure/attachment/12707250/Changing%20the%20pivot%20-%20Restricted%20Level.png
> <https://issues.apache.org/jira/secure/attachment/12707250/Changing%20the%20pivot%20-%20Restricted%20Level.png>
>
> User will reach Restricted Level after he successfully pass the Safe
> Level. Then in local storage we will have up-to-date data. But for this
> level it will be using all the data in the database. So Charting will grab
> the data from storage and back-end.
>
> Your ideas are mostly welcomed.
>
>
> On Wed, Mar 25, 2015 at 2:55 PM, madhuka udantha <madhukaudan...@gmail.com
> > wrote:
>
>> Hi,
>>
>> I want to know about the code structure and zeppelin architecture? Is
>> there any good post / article / wiki regarding the said.
>> Also if there is any quick start guide regarding development of Zeppelin
>> please share it with me.
>>
>> Thanks.
>>
>> On Mon, Mar 23, 2015 at 10:44 AM, madhuka udantha <
>> madhukaudan...@gmail.com> wrote:
>>
>>> Hi, moon
>>>
>>> Yes, Since
>>>
>>>> "Moving computation is cheaper than moving data"
>>>
>>> We can do computation in computing framework.
>>>
>>> For simple pivot changing or filtering can be handle in local storage
>>> with indexing databases depending on the current user level.
>>> As you saw, computations will be handle in the back ends.
>>>
>>> Great to hear about the building rich GUI, I will give me chart library 
>>> ideas
>>> on there.
>>>
>>> Your ideas are always welcome, those will be helpful for my task and
>>> draft proposal
>>>
>>> Thanks
>>>
>>> On Mon, Mar 23, 2015 at 7:59 AM, moon soo Lee <m...@apache.org> wrote:
>>>
>>>> Hi, madhuka udantha
>>>>
>>>> I think your idea about chart library and data transformation engine
>>>> sounds
>>>> cool. For the data transform modules, it's good idea to make this
>>>> pluggable
>>>> to data transform engine. But i'm not sure getting result locally and do
>>>> transform for pivot or filtering to prevent run query again is good
>>>> idea.
>>>> Because of Zeppelin is (not limited but) trying to build analytical
>>>> environment on top of distributed computing framework, like Spark,
>>>> Flink,
>>>> Ignite, etc. Most of distributed computing framework Zeppelin trying to
>>>> integrate is following the same paradigm "Moving computation is cheaper
>>>> than moving data". In this manner, size of data that transform engine
>>>> need
>>>> to handle can be easily multiple TB. Which will take long time to copy
>>>> to
>>>> local machine and process. So i think transform module should be run on
>>>> underlying distributed computing framework.
>>>>
>>>> And about Chart library, we have started discussion thread about
>>>> building
>>>> rich GUI inside of notebook. it might be related.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>>
>>>> On Mon, Mar 23, 2015 at 2:27 AM madhuka udantha <
>>>> madhukaudan...@gmail.com>
>>>> wrote:
>>>>
>>>> > On Sun, Mar 22, 2015 at 7:05 PM, Corneau Damien <
>>>> cornead...@apache.org>
>>>> > wrote:
>>>> >
>>>> > > Hi,
>>>> > >
>>>> > > Being able to aggregate on the query side is a great idea and would
>>>> allow
>>>> > > us to transfer less data as well as having a full query
>>>> representation of
>>>> > > the visualization.
>>>> > >
>>>> > > However creating a SQL query dynamically is a pretty difficult
>>>> task, and
>>>> > > might be too much for that scope.
>>>> > >
>>>> > > Also I see some possible problems with this method:
>>>> > >  - Changing the pivot or simple filtering would mean running the
>>>> query
>>>> > > again
>>>> > >
>>>> > No, the query wont run again.
>>>> > In the first run of the query data is collected and stored locally-
>>>> local
>>>> > storage [1](using indexing techniques to make retrieval faster) So
>>>> changing
>>>> > pivot or simple filtering will use the local storage.
>>>> > If any attribute or data is missing in local storage then it will
>>>> retrieve
>>>> > only that and save the network bandwidth as well.
>>>> > Does my explanation make sense.
>>>> >
>>>> >
>>>> >
>>>> > >  - Being able to make pivot style SQL query would be really hard,
>>>> > >    we would need multiple sub-queries or even some times multiple
>>>> queries
>>>> > > (I tried a few times and could have the result wanted only with
>>>> > > visualization side pivot).
>>>> > >    It would end up with really bad SQL queries, especially with the
>>>> Hive
>>>> > > SQL or Spark SQL limitations and would take way more time to
>>>> process.
>>>> > >
>>>> > Agreed. I'm not planing to use pivot style queries.
>>>> >
>>>> > Any suggestions?
>>>> >
>>>> >
>>>> > Thanks.
>>>> >
>>>> >
>>>> > > On Sun, Mar 22, 2015 at 10:08 PM, IT CTO <goi....@gmail.com> wrote:
>>>> > >
>>>> > > > Hi,
>>>> > > >
>>>> > > > The Chart library features sounds promising.
>>>> > > > As  for the data engine - one thing that I think is missing is the
>>>> > > ability
>>>> > > > to use the visualization to drive the aggregation in the SQL.
>>>> today,
>>>> > you
>>>> > > > first write the SQL, you execute it, *limited by the number of
>>>> results
>>>> > > sent
>>>> > > > to the client*, and then you use viz to understand the results.
>>>> > > > Alternatively, if through the visualization I can generate a
>>>> better SQL
>>>> > > > which returns returns an aggregated data-set then I can analyze a
>>>> > bigger
>>>> > > > amount of data.
>>>> > > >
>>>> > > > I hope I was clear enough in my explanation :-)
>>>> > > >
>>>> > > > Eran
>>>> > > >
>>>> > > >
>>>> > > > On Fri, Mar 20, 2015 at 8:21 AM, madhuka udantha <
>>>> > > madhukaudan...@gmail.com
>>>> > > > >
>>>> > > > wrote:
>>>> > > >
>>>> > > > > Hi,
>>>> > > > >
>>>> > > > > Here is my proposing ideas.
>>>> > > > > According to COMDEV-119 jira. Charts are hard coded until now
>>>> and
>>>> > data
>>>> > > > > transformation issue was highlighted since different charts have
>>>> > > > different
>>>> > > > > pivot fields eg: Area charts, Scatter, Surface charts, Bubble
>>>> charts,
>>>> > > > Radar
>>>> > > > > charts. etc..
>>>> > > > >
>>>> > > > > To solve this I am introducing a two major component one is
>>>> called
>>>> > > 'Chart
>>>> > > > > library' and 'Data transformation engine'. Chart library is
>>>> located
>>>> > > where
>>>> > > > > it shows the chats that are currently plugged. There we can plug
>>>> > chart
>>>> > > > > types and those can be reused.
>>>> > > > >
>>>> > > > > *Chart library features *
>>>> > > > >
>>>> > > > >    - Users can select the chart from library
>>>> > > > >    - Those charts are pluggable to library
>>>> > > > >    - Charts can be plugged by config(json)/UI with wizard
>>>> > > > >    - Configuration/Meta file of the chart contains interface,
>>>> libs,
>>>> > > > themes
>>>> > > > >    and a data transformation types/mappings
>>>> > > > >
>>>> > > > >
>>>> > > > >
>>>> > > > > *Data Transformation Engine*
>>>> > > > > 'Data transformation engine' contains data transformation
>>>> modules.
>>>> > > Those
>>>> > > > > modules are also pluggable to engine. Those have connections to
>>>> > charts.
>>>> > > > > Data transformation engine sit between the data (sql) and
>>>> chart. So
>>>> > > this
>>>> > > > > module  converts data and map them to each chart pivot field
>>>> > > > >
>>>> > > > >    - This module will look at pivot fields of the chart
>>>> > > > >    - Selected attributes of the SQL query
>>>> > > > >    - Attribute value operations improvement (string split, value
>>>> > > > >    aggregation, round number round)
>>>> > > > >
>>>> > > > >
>>>> > > > > Another improvement that I notice is that
>>>> > > > >
>>>> > > > >    - Query Edit auto-completion support (with Ctrl+space)
>>>> > > > >
>>>> > > > >
>>>> > > > > Your ideas are welcome here
>>>> > > > > Thanks
>>>> > > > >
>>>> > > > > On Fri, Mar 20, 2015 at 10:57 AM, madhuka udantha <
>>>> > > > > madhukaudan...@gmail.com>
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > > > Hi All,
>>>> > > > > >
>>>> > > > > > I'm Udantha, MSc. Student at University of Moratuwa. This
>>>> GSoC 2015
>>>> > > > > > project, 0COMDEV-1190 captures my interest.
>>>> > > > > >
>>>> > > > > > I have abundant experiences of visualization techniques
>>>> creating
>>>> > > > numerous
>>>> > > > > > dashboards[1,2] with javascript, html5, angularJS, d3
>>>> charting etc.
>>>> > > > > >
>>>> > > > > > My current research area comprises of big data where I have
>>>> worked
>>>> > > with
>>>> > > > > > various types of data sets. Also I'm working with cluster
>>>> > > > representation
>>>> > > > > > and classification techniques where visualization amounts to a
>>>> > > > > considerable
>>>> > > > > > part. I was following COMDEV-119 (jira) with Alexander
>>>> Bezzubov and
>>>> > > > > CORNEAU
>>>> > > > > > Damien for more than week.
>>>> > > > > >
>>>> > > > > > Thanks
>>>> > > > > >
>>>> > > > > > [1] http://wso2.com/products/user-engagement-server/
>>>> > > > > > [2] https://github.com/wso2/jaggery
>>>> > > > > > --
>>>> > > > > > Cheers,
>>>> > > > > > Madhuka Udantha
>>>> > > > > > http://madhukaudantha.blogspot.com
>>>> > > > > >
>>>> > > > >
>>>> > > > >
>>>> > > > >
>>>> > > > > --
>>>> > > > > Cheers,
>>>> > > > > Madhuka Udantha
>>>> > > > > http://madhukaudantha.blogspot.com
>>>> > > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > --
>>>> > > > Eran | CTO
>>>> > > >
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Cheers,
>>>> > Madhuka Udantha
>>>> > http://madhukaudantha.blogspot.com
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Madhuka Udantha
>>> http://madhukaudantha.blogspot.com
>>>
>>
>>
>>
>> --
>> Cheers,
>> Madhuka Udantha
>> http://madhukaudantha.blogspot.com
>>
>
>
>
> --
> Cheers,
> Madhuka Udantha
> http://madhukaudantha.blogspot.com
>



-- 
Cheers,
Madhuka Udantha
http://madhukaudantha.blogspot.com

Re: [GSoC 2015][COMDEV-119] Zeppelin GSoC Project: add more D3 visualization

Reply via email to