Re: Mock Data Source Question

Paul Rogers Fri, 31 Mar 2017 13:40:46 -0700

Hi Charles,

The mock reader has existing in Drill for some time (thanks to the original 
authors!) and we recently extended it a bit. There are now three ways to use it.


* Via a physical plan (the original method). See [1]
* Via a specially-coded SQL query (as Boaz explained).
* Via a SQL query that references a JSON file. See [2]

The first step for either of the SQL queries is to configure the mock storage 
plugin. I always do this from code, so I’m not exactly sure of the steps from 
the web UI. But, basically, you need to create a plugin definition called 
“mock” (can really be anything) that is an instance of the “mock” storage 
plugin type. No configuration parameters are needed.

Then, for the SQL, use the steps that Boaz explained. This gives 
randomly-distributed mock data for a few supported types (int, double, boolean 
and float.)

In the simple-SQL form, the mock acts as a single row group and your query will 
have only one slice. Using the JSON definition, you can create multiple row 
groups, more complex schemas and have custom control over the data generator. 
For that, see [3].

All of this can use better explanations. Ask questions where we have gaps and 
I’ll go ahead and fill in any needed information.

Thanks,

- Paul

[1] 
https://github.com/paul-rogers/drill/wiki/Testing-with-Physical-Plans-and-Mock-Data
[2] https://github.com/paul-rogers/drill/wiki/The-Mock-Record-Reader
[3] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java

> On Mar 31, 2017, at 12:20 PM, Boaz Ben-Zvi <[email protected]> wrote:
> 
> Hi Charles,
> 
>     Below is an example for using the mock storage; I use this now for 
> testing my new code ( Hash Aggregation spilling ; so this specific test will 
> not work for you now …).
> The query below  -  "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) FROM 
> `mock`.`employee_1200K` GROUP BY empid_s17, dept_i, branch_i";
> shows that you just make up the names for the table and the columns, followed 
> by the size (for the table) and the column type ( “i” for integer,  “d” for 
> float, “s<size>” for a varchar + size).
> Not sure if all the used imports are in 1.10 ; else you’d need the latest 
> code.
> 
>     Boaz
> 
> 
> package org.apache.drill.exec.physical.impl.agg;
> 
> import ch.qos.logback.classic.Level;
> import org.apache.drill.BaseTestQuery;
> import org.apache.drill.exec.ExecConstants;
> import org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate;
> import org.apache.drill.exec.planner.physical.PlannerSettings;
> import org.apache.drill.exec.proto.UserBitShared;
> import org.apache.drill.test.ClientFixture;
> import org.apache.drill.test.ClusterFixture;
> import org.apache.drill.test.FixtureBuilder;
> import org.apache.drill.test.LogFixture;
> import org.apache.drill.test.ProfileParser;
> import org.apache.drill.test.QueryBuilder;
> import org.junit.Ignore;
> import org.junit.Test;
> 
> import java.util.List;
> 
> import static org.junit.Assert.assertEquals;
> import static org.junit.Assert.assertTrue;
> 
> /**
> *  Test spilling for the Hash Aggr operator (using the mock reader)
> */
> public class TestHashAggrSpill extends BaseTestQuery {
> 
>    private void runAndDump(ClientFixture client, String sql, long 
> expectedRows, long spillCycle, long spilledPartitions) throws Exception {
>        String plan = client.queryBuilder().sql(sql).explainJson();
> 
>        QueryBuilder.QuerySummary summary = 
> client.queryBuilder().sql(sql).run();
>        if ( expectedRows > 0 ) {
>            assertEquals(expectedRows, summary.recordCount());
>        }
>        System.out.println(String.format("======== \n Results: %,d records, %d 
> batches, %,d ms\n ========", summary.recordCount(), summary.batchCount(), 
> summary.runTimeMs() ) );
> 
>        System.out.println("Query ID: " + summary.queryIdString());
>        ProfileParser profile = client.parseProfile(summary.queryIdString());
>        profile.print();
>        List<ProfileParser.OperatorProfile> ops = 
> profile.getOpsOfType(UserBitShared.CoreOperatorType.HASH_AGGREGATE_VALUE);
> 
>        assertTrue( ! ops.isEmpty() );
>        // check for the first op only
>        ProfileParser.OperatorProfile hag = ops.get(0);
>        long opCycle = 
> hag.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
>        assertEquals(spillCycle, opCycle);
>        long op_spilled_partitions = 
> hag.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
>        assertEquals(spilledPartitions, op_spilled_partitions);
>    }
> 
>    /**
>     * Test "normal" spilling: Only 2 partitions (out of 4) would require 
> spilling
>     * ("normal spill" means spill-cycle = 1 )
>     *
>     * @throws Exception
>     */
>    @Test
>    public void testHashAggrSpill() throws Exception {
>        LogFixture.LogFixtureBuilder logBuilder = LogFixture.builder()
>            .toConsole()
>            .logger("org.apache.drill.exec.physical.impl.aggregate", 
> Level.WARN)
>            ;
> 
>        FixtureBuilder builder = ClusterFixture.builder()
>            .configProperty(ExecConstants.HASHAGG_MAX_MEMORY_KEY,"46000kB")
>            .configProperty(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY,16)
>            // .sessionOption(PlannerSettings.EXCHANGE.getOptionName(), true)
>            .maxParallelization(2)
>            .saveProfiles()
>            //.keepLocalFiles()
>            ;
>        try (LogFixture logs = logBuilder.build();
>             ClusterFixture cluster = builder.build();
>             ClientFixture client = cluster.clientFixture()) {
>            String sql = "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) 
> FROM `mock`.`employee_1200K` GROUP BY empid_s17, dept_i, branch_i";
>            runAndDump(client, sql, 1_200_000, 1, 2);
>        }
>    }
> }
> 
> 
> 
> 
> On 3/31/17, 7:59 AM, "Charles Givre" <[email protected]> wrote:
> 
>    Hello there,
>    Is there any documentation for the new mock storage engine?  It looks
>    really useful.
>    Thanks,
>    - Charles
> 
>

Re: Mock Data Source Question

Reply via email to