[
https://issues.apache.org/jira/browse/BEAM-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891773#comment-15891773
]
SungJunyoung edited comment on BEAM-1439 at 3/7/17 4:46 AM:
------------------------------------------------------------
Hello, I am a third year student in computer engineering at Kyunghee University
in Korea. I came to know this project through the GSoC list. I am very
interested in the apache beam project. And I wrote a simple pipeline of
documentation. Contributing to the project by creating examples and datasets
that use advanced pipelines seems very interesting. If you have a document or a
mail address that can be contacted, it would be a great help to me. Thank you!
ps. I am trying to translate the Beam documents in github :
https://github.com/sungjunyoung/apache_beam_doc_ko
was (Author: wnsdud1861):
Hello, I am a third year student in computer engineering at Kyunghee University
in Korea. I came to know this project through the GSoC list. I am very
interested in the apache beam project. And I wrote a simple pipeline of
documentation. Contributing to the project by creating examples and datasets
that use advanced pipelines seems very interesting. If you have a document or a
mail address that can be contacted, it would be a great help to me. Thank you!
> Beam Example(s) exploring public document datasets
> --------------------------------------------------
>
> Key: BEAM-1439
> URL: https://issues.apache.org/jira/browse/BEAM-1439
> Project: Beam
> Issue Type: Wish
> Components: examples-java
> Reporter: Kenneth Knowles
> Assignee: Kenneth Knowles
> Priority: Minor
> Labels: gsoc2017, java, mentor, python
>
> In Beam, we have examples illustrating counting the occurrences of words and
> performing a basic TF-IDF analysis on the works of Shakespeare (or whatever
> you point it at). It would be even cooler to do these analyses, and more, on
> a much larger data set that is really the subject of current investigations.
> In chatting with professors at the University of Washington, I've learned
> that scholars of many fields would really like to explore new and highly
> customized ways of processing the growing body of publicly-available
> scholarly documents, such as PubMed Central. Queries like "show me documents
> where chemical compounds X and Y were both used in the 'method' section"
> So I propose a Google Summer of Code project wherein a student writes some
> large-scale Beam pipelines to perform analyses such as term frequency, bigram
> frequency, etc.
> Skills required:
> - Java or Python
> - (nice to have) Working through the Beam getting started materials
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)