Re: Announcement & Proposal: HDFS tests on large cluster.

Chamikara Jayalath Thu, 07 Jun 2018 13:37:44 -0700

We still use Jenkins machines to execute the test but data stores are
hosted in Kubernetes.


On Thu, Jun 7, 2018 at 1:35 PM Pablo Estrada <[email protected]> wrote:

> Just out of curiosity: This does not use the Jenkins machines then?
> -P.
>
> On Thu, Jun 7, 2018 at 1:33 PM Alan Myrvold <[email protected]> wrote:
>
>> Done. Changed the size of the io-datastores kubernetes cluster in
>> apache-beam-testing to 3 nodes.
>>
>> On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> the node pool size of io-datastores kubernetes cluster in
>>> apache-beam-testing project must be changed from 1 -> 3 (or other value).
>>> @Alan Myrvold was already helpful with kubernetes cluster settings so
>>> far, but I am not aware who made decisions regarding that as
>>> this will increase monthly billing.
>>>
>>> Kamil Szewczyk
>>>
>>> 2018-06-07 6:27 GMT+02:00 Kenneth Knowles <[email protected]>:
>>>
>>>> This is rad. Another +1 from me for a bigger cluster. What do you need
>>>> to make that happen?
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada <[email protected]>
>>>> wrote:
>>>>
>>>>> This is really cool!
>>>>>
>>>>> +1 for having a cluster with more than one machine run the test.
>>>>>
>>>>> -P.
>>>>>
>>>>> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>>>>>>> <https://github.com/apache/beam/pull/5441> we have 4 file-based
>>>>>>> HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:
>>>>>>>
>>>>>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>>>>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>>>>>> - beam_PerformanceTests_AvroIOIT_HDFS
>>>>>>> - beam_PerformanceTests_XmlIOIT_HDFS
>>>>>>>
>>>>>>> The "Large HDFS Cluster" (in contrast to the small one, that is also
>>>>>>> available) consists of a master node and three data nodes all in 
>>>>>>> separate
>>>>>>> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
>>>>>>> distributed nodes) and possibly run bigger tests so there's progress! :)
>>>>>>>
>>>>>>>
>>>>>> This is great. Also, looks like results are available in test
>>>>>> dashboard:
>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
>>>>>> (BTW we should add information about dashboard to the testing doc:
>>>>>> https://beam.apache.org/contribute/testing/)
>>>>>>
>>>>>> I'm currently working on proper documentation for this so that
>>>>>>> everyone can use it in IOITs (stay tuned).
>>>>>>>
>>>>>>> Regarding the above, I'd like to propose scaling up the
>>>>>>> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we 
>>>>>>> scale it
>>>>>>> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves 
>>>>>>> on
>>>>>>> different machines rather than one, making it an even more "real-life"
>>>>>>> scenario (possibly more efficient?). Moreover, other Performance Tests
>>>>>>> (such as JDBC or mongo) could use more space for their infrastructure as
>>>>>>> well. Scaling up the cluster could also turn out useful for some future
>>>>>>> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
>>>>>>> Jenkins).
>>>>>>>
>>>>>>> WDYT? Are there any objections?
>>>>>>>
>>>>>> +1 for increasing the size of Kubernetes cluster.
>>>>>>
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/BEAM-4508
>>>>>>>
>>>>>>> --
>>>>> Got feedback? go/pabloem-feedback
>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>
>>>>
>>> --
> Got feedback? go/pabloem-feedback
> <https://goto.google.com/pabloem-feedback>
>

Re: Announcement & Proposal: HDFS tests on large cluster.

Reply via email to