[ 
https://issues.apache.org/jira/browse/MAHOUT-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Musselman updated MAHOUT-2142:
-------------------------------------
    Description: 
*About*

Discussion and planning epic for adding blockchain data sources and analytics 
use cases. Proposal is to provide a new data source, namely any number of 
ethereum-compatible ledgers, and pick a few compelling use cases to build out 
this year.

We will add children to this epic for specific work items.

*Example Use Cases*
 # Search-indexes of given ledgers
 # Computed similarity to other accounts on the same ledger based on activity 
history
 # Time-series analysis of gas (transaction) fees across multiple ledgers
 # Time-series analysis of transactions (overall # per week/month/year/custom 
period, by user account etc.) for a list of ledgers. (Comparative analysis of 
usage)
 # Max/Min range of transactions for different ledgers

 
*How to Get Started*
To explore ledger operations and data, get a copy of go-ethereum (geth: 
[https://geth.ethereum.org/docs/install-and-build/installing-geth]) and run it 
against a network to get all historical records. The Goerli test network's 
entire three years of data is only 32GB, so there are small enough data sets to 
play with, and the data files are stored on your local disk by default at 
~/ethereum.
 
There are libraries that interact live with any given ledger including Web3JS 
([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py 
([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is 
simple.
 
Reading and indexing the actual data might mean writing custom parsers for 
Mahout and Lucene, and possibly getting into decompiling bytecode back into 
readable Solidity code, so there are pieces we would need to plan out.

  was:
*About*

Proposal is to provide a new data source, namely any number of 
ethereum-compatible ledgers, and pick a few compelling use cases to build out 
this year.

We will add children to this epic for specific work items.

*Example Use Cases*
 # Search-indexes of given ledgers
 # Computed similarity to other accounts on the same ledger based on activity 
history
 # Time-series analysis of gas (transaction) fees across multiple ledgers
 # Time-series analysis of transactions (overall # per week/month/year/custom 
period, by user account etc.) for a list of ledgers. (Comparative analysis of 
usage)
 # Max/Min range of transactions for different ledgers

 
*How to Get Started*
To explore ledger operations and data, get a copy of go-ethereum (geth: 
[https://geth.ethereum.org/docs/install-and-build/installing-geth]) and run it 
against a network to get all historical records. The Goerli test network's 
entire three years of data is only 32GB, so there are small enough data sets to 
play with, and the data files are stored on your local disk by default at 
~/ethereum.
 
There are libraries that interact live with any given ledger including Web3JS 
([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py 
([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is 
simple.
 
Reading and indexing the actual data might mean writing custom parsers for 
Mahout and Lucene, and possibly getting into decompiling bytecode back into 
readable Solidity code, so there are pieces we would need to plan out.

        Summary: Blockchain Data and Analytics  (was: Discussion and planning 
epic for adding blockchain data sources and analytics use cases)

> Blockchain Data and Analytics
> -----------------------------
>
>                 Key: MAHOUT-2142
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-2142
>             Project: Mahout
>          Issue Type: Epic
>            Reporter: Andrew Musselman
>            Assignee: Andrew Musselman
>            Priority: Major
>
> *About*
> Discussion and planning epic for adding blockchain data sources and analytics 
> use cases. Proposal is to provide a new data source, namely any number of 
> ethereum-compatible ledgers, and pick a few compelling use cases to build out 
> this year.
> We will add children to this epic for specific work items.
> *Example Use Cases*
>  # Search-indexes of given ledgers
>  # Computed similarity to other accounts on the same ledger based on activity 
> history
>  # Time-series analysis of gas (transaction) fees across multiple ledgers
>  # Time-series analysis of transactions (overall # per week/month/year/custom 
> period, by user account etc.) for a list of ledgers. (Comparative analysis of 
> usage)
>  # Max/Min range of transactions for different ledgers
>  
> *How to Get Started*
> To explore ledger operations and data, get a copy of go-ethereum (geth: 
> [https://geth.ethereum.org/docs/install-and-build/installing-geth]) and run 
> it against a network to get all historical records. The Goerli test network's 
> entire three years of data is only 32GB, so there are small enough data sets 
> to play with, and the data files are stored on your local disk by default at 
> ~/ethereum.
>  
> There are libraries that interact live with any given ledger including Web3JS 
> ([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py 
> ([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is 
> simple.
>  
> Reading and indexing the actual data might mean writing custom parsers for 
> Mahout and Lucene, and possibly getting into decompiling bytecode back into 
> readable Solidity code, so there are pieces we would need to plan out.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to