richardstartin opened a new issue #8009:
URL: https://github.com/apache/pinot/issues/8009
Run this query against hybrid quick start:
```sql
explain plan for select count(*) from airlineStats where
insubquery(OriginAirportID, 'select idset(DestAirportID) from airlineStats') = 1
```
it prints:
```json
{
"resultTable": {
"dataSchema": {
"columnNames": [
"Operator",
"Operator_Id",
"Parent_Id"
],
"columnDataTypes": [
"STRING",
"INT",
"INT"
]
},
"rows": [
[
"BROKER_REDUCE(limit:10)",
0,
-1
],
[
"COMBINE_AGGREGATE",
1,
0
],
[
"AGGREGATE(aggregations:count(*))",
2,
1
],
[
"TRANSFORM_PASSTHROUGH()",
3,
2
],
[
"PROJECT()",
4,
3
],
[
"FILTER_EXPRESSION(operator:EQ,predicate:inidset(OriginAirportID,'ATowAAABAAAAAAAZARAAAACXJ5gnnCeiJ6snrSe6J8kn4CcRKBwoJyg7KF0oeSiEKJ0oqCi3KL8oISk3KUEpVSlnKXwpgymHKZMpqim9KcUp2SnhKegp6ynsKfMp+ykCKhsqHSohKigqMCpFKmEqdCp6KqYqriqyKuQq7iryKvsqISsiKykrMSs6KzsrRCtZK2UrZytyK4Qriiu5K8Mr9Cv7KwMsCiwOLBwsIiwsLDMsSSyVLJ8sqSzPLNks7ywFLREtFC1TLVwtYS1iLWgtbi11LXYteS2ALbEtyS3OLf8tAi4vLlkuWy5sLoEukS6xLsUuyS7MLs4u0i7bLtwu4y7nLuwu8C4+L40vny+lL64vuS/oL+ov9i/4LyAwIzAvMDMwNzBlMGcwcjCZMKAwozC+MN8w6zDWMRMyVDJYMlkyWzJcMmAyczKRMpcymTKaMrYywDLgMuUyBTNHM2YzgDOOM5QzrjOwM7wzyDPQM90z6jPwM/czHjQgNDA0NzRBNEw0bjRwNHk0pDStNK40rzS3NL40CTXjNeQ1BjYbNi82MTZDNmo2azZtNow2kja2Nss26TYSNxQ3GzccNyE3KjdxN6w3rjewN7Y34zfxN3k4lziZOJw4uDi8OM846jjuOPA4KzlSOVc5WzldOWE5aDlqOXU5ijmbObM5vznKOd457DnvOfo5+zkVOi06OTo8Omg6cDqKOqg6sDqzOsE6yDreOvg6kTu/O8g7/DsKPBA8FDwdPCk8Mzw0PPc8CD3hPS8+Wj8=')
= '1')",
5,
4
]
]
},
"exceptions": [],
"numServersQueried": 1,
"numServersResponded": 1,
"numSegmentsQueried": 1,
"numSegmentsProcessed": 0,
"numSegmentsMatched": 0,
"numConsumingSegmentsQueried": 0,
"numDocsScanned": 0,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 0,
"numGroupsLimitReached": false,
"totalDocs": 289,
"timeUsedMs": 22,
"offlineThreadCpuTimeNs": 0,
"realtimeThreadCpuTimeNs": 0,
"offlineSystemActivitiesCpuTimeNs": 0,
"realtimeSystemActivitiesCpuTimeNs": 0,
"offlineResponseSerializationCpuTimeNs": 0,
"realtimeResponseSerializationCpuTimeNs": 0,
"offlineTotalCpuTimeNs": 0,
"realtimeTotalCpuTimeNs": 0,
"segmentStatistics": [],
"traceInfo": {},
"minConsumingFreshnessTimeMs": 0,
"numRowsResultSet": 6
}
```
Printing function parameters leaks data when taking an explain plan. The
base64 encoded idsets can be deserialised to reveal the values of an entire
column, and anyone capable of reading the source code can decode these
parameters:
```java
public static void main(String... args) throws IOException {
ByteBuffer idset =
ByteBuffer.wrap(Base64.getDecoder().decode(args[0])).position(1).slice().order(ByteOrder.LITTLE_ENDIAN);
RoaringBitmap bitmap = new RoaringBitmap();
bitmap.deserialize(idset);
System.err.println(Arrays.toString(bitmap.toArray()));
}
```
prints the airline ids, and the subquery could easily have been for social
security numbers of users satisfying some condition:
```
[10135, 10136, 10140, 10146, 10155, 10157, 10170, 10185, 10208, 10257,
10268, 10279, 10299, 10333, 10361, 10372, 10397, 10408, 10423, 10431, 10529,
10551, 10561, 10581, 10599, 10620, 10627, 10631, 10643, 10666, 10685, 10693,
10713, 10721, 10728, 10731, 10732, 10739, 10747, 10754, 10779, 10781, 10785,
10792, 10800, 10821, 10849, 10868, 10874, 10918, 10926, 10930, 10980, 10990,
10994, 11003, 11041, 11042, 11049, 11057, 11066, 11067, 11076, 11097, 11109,
11111, 11122, 11140, 11146, 11193, 11203, 11252, 11259, 11267, 11274, 11278,
11292, 11298, 11308, 11315, 11337, 11413, 11423, 11433, 11471, 11481, 11503,
11525, 11537, 11540, 11603, 11612, 11617, 11618, 11624, 11630, 11637, 11638,
11641, 11648, 11697, 11721, 11726, 11775, 11778, 11823, 11865, 11867, 11884,
11905, 11921, 11953, 11973, 11977, 11980, 11982, 11986, 11995, 11996, 12003,
12007, 12012, 12016, 12094, 12173, 12191, 12197, 12206, 12217, 12264, 12266,
12278, 12280, 12320, 12323, 12335, 12339, 12343, 12389, 12391, 12402, 12441,
12448, 12451, 12478, 12511, 12523, 12758, 12819, 12884, 12888, 12889, 12891,
12892, 12896, 12915, 12945, 12951, 12953, 12954, 12982, 12992, 13024, 13029,
13061, 13127, 13158, 13184, 13198, 13204, 13230, 13232, 13244, 13256, 13264,
13277, 13290, 13296, 13303, 13342, 13344, 13360, 13367, 13377, 13388, 13422,
13424, 13433, 13476, 13485, 13486, 13487, 13495, 13502, 13577, 13795, 13796,
13830, 13851, 13871, 13873, 13891, 13930, 13931, 13933, 13964, 13970, 14006,
14027, 14057, 14098, 14100, 14107, 14108, 14113, 14122, 14193, 14252, 14254,
14256, 14262, 14307, 14321, 14457, 14487, 14489, 14492, 14520, 14524, 14543,
14570, 14574, 14576, 14635, 14674, 14679, 14683, 14685, 14689, 14696, 14698,
14709, 14730, 14747, 14771, 14783, 14794, 14814, 14828, 14831, 14842, 14843,
14869, 14893, 14905, 14908, 14952, 14960, 14986, 15016, 15024, 15027, 15041,
15048, 15070, 15096, 15249, 15295, 15304, 15356, 15370, 15376, 15380, 15389,
15401, 15411, 15412, 15607, 15624, 15841, 15919, 16218]
```
This would make it impossible for either a business user to take an explain
plan from a production database on behalf of an operator and share it to
diagnose a performance problem, or to create a role common in enterprises which
gives technical users the ability run diagnostic commands but not access
production data, because they can essentially access any data they like
combining explain plans and idsets.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]