Andrew Musselman created MAHOUT-2142:
----------------------------------------
Summary: Discussion and planning epic for adding blockchain data
sources and analytics use cases
Key: MAHOUT-2142
URL: https://issues.apache.org/jira/browse/MAHOUT-2142
Project: Mahout
Issue Type: Epic
Reporter: Andrew Musselman
Assignee: Andrew Musselman
*About*
Proposal is to provide a new data source, namely any number of
ethereum-compatible ledgers, and pick a few compelling use cases to build out
this year.
We will add children to this epic for specific work items.
*Example Use Cases*
# Search-indexes of given ledgers
# Computed similarity to other accounts on the same ledger based on activity
history
# Time-series analysis of gas (transaction) fees across multiple ledgers
# Time-series analysis of transactions (overall # per week/month/year/custom
period, by user account etc.) for a list of ledgers. (Comparative analysis of
usage)
# Max/Min range of transactions for different ledgers
*How to Get Started*
To explore ledger operations and data, get a copy of go-ethereum (geth:
[https://geth.ethereum.org/docs/install-and-build/installing-geth]) and run it
against a network to get all historical records. The Goerli test network's
entire three years of data is only 32GB, so there are small enough data sets to
play with, and the data files are stored on your local disk by default at
~/ethereum.
There are libraries that interact live with any given ledger including Web3JS
([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py
([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is
simple.
Reading and indexing the actual data might mean writing custom parsers for
Mahout and Lucene, and possibly getting into decompiling bytecode back into
readable Solidity code, so there are pieces we would need to plan out.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)