Please log a jira case for the commons-lang3 change. It looks good. One or two places I’d create a function rather than having a blob of code inline.
Your use of default locale in the CSV adapter looks wrong. Calcite is a server, so never uses default locale or time zone. In fact we use forbiddenApis to check, so we should add a few methods to its configuration. Julian > On Jan 3, 2022, at 12:30 PM, Gunnar Morling > <[email protected]> wrote: > > Hi, > > Thanks a lot for this, I think trimming down the dependencies of Calcite > will be of great help for its adoption. > >> So, the easiest way to reduce dependencies would be to make certain > classes of SQL functions optional (i.e. move them out of core). > > That sounds like a good idea. > >> commons-lang3, commons-codec, commons-io are probably only used in one or > two places each; > > To make some progress there, I've created PR > https://github.com/apache/calcite/pull/2672 which removes the dependency to > commons-lang3 from the entire code base. Any feedback on that PR would > be appreciated (I still need to log an issue, but wanted to share quickly > what I had). I can try and take a look at the other ones, if there's > interest in this. > > Re Janino, is there any reason for not using the compiler implementation > coming with the JDK? Alternatively, one could also consider to generate > byte code directly using ASM, which wouldn't be beneficial dependency-wise, > but it may improve the performance of this generation step (I still lack > insight why this is done in the first place). > > Thanks, > > --Gunnar > >> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde < >> [email protected]>: >> >> Regarding dependencies. Here are the runtime dependencies from >> core/build.gradle.kts (ignoring test and annotation libraries): >> >> * api("com.esri.geometry:esri-geometry-api") >> * api("com.fasterxml.jackson.core:jackson-annotations") >> * api("com.google.guava:guava") >> * api("org.apache.calcite.avatica:avatica-core") >> * api("org.slf4j:slf4j-api") >> * implementation("com.fasterxml.jackson.core:jackson-core") >> * implementation("com.fasterxml.jackson.core:jackson-databind") >> * >> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml") >> * implementation("com.google.uzaygezen:uzaygezen-core") >> * implementation("com.jayway.jsonpath:json-path") >> * implementation("com.yahoo.datasketches:sketches-core") >> * implementation("commons-codec:commons-codec") >> * implementation("net.hydromatic:aggdesigner-algorithm") >> * implementation("org.apache.commons:commons-dbcp2") >> * implementation("org.apache.commons:commons-lang3") >> * implementation("commons-io:commons-io") >> * implementation("org.codehaus.janino:commons-compiler") >> * implementation("org.codehaus.janino:janino") >> >> A few libraries are used only for a narrow range of functionality: >> * esri-geometry and uzaygezen-core are used by geospatial functions; >> * sketches-core is used by the HLL aggregate functions; >> * json-path is used by some JSON functions; >> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to >> load models, and to serialize RelNodes to and from JSON; >> * commons-lang3, commons-codec, commons-io are probably only used in one >> or two places each; >> * aggdesigner-algotihm is used for recommending materialized views. >> >> So, the easiest way to reduce dependencies would be to make certain >> classes of SQL functions optional (i.e. move them out of core). >> >> Julian >> >> >> >>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <[email protected]> wrote: >>> >>> WRT SBOM (Julian): My general experience is that most large orgs use >>> scanners now (either open or closed) and they will scan whether you have >> a >>> bill of materials or not. I wouldn't worry about adding something >>> additional. >>> >>> WRT too many dependencies (Gunnar): I completely agree with the general >>> feeling of too many (and with Guava, jackson less so). I think the core >>> challenge (no pun intended) is that calcite-core is really a lot of >>> different components. For example, I have frequently wished that parser, >>> planner and enumerable were separate modules. And if they were, I'd guess >>> that each would have a narrower dependency range. I've also wished many >>> times that runtime compilation was an optional addon as opposed to >>> required/coupled in the core... >>> >>> When I've thought about how to dissect in the past, I think the big >>> challenge would be tests, where things are sometimes mixed together. >>> Breaking change possibilities could be at least somewhat mitigated by >>> moving classes but not packages. >>> >>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling >>> <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> In a way, Calcite's build configuration as well as the published POM >> could >>>> be considered as such an SBOM? In particular when looking at the latter >>>> through services like mvnrepository [1], you get quite a good view on >> the >>>> dependency versions, licenses, any potential CVEs, etc. I think this >> should >>>> satisfy most user needs around this? Or are you referring to the notion >> of >>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM >> with >>>> all the Calcite component versions which people can then use with >> Maven's >>>> import scope (there should be something comparable for Gradle)? If so, >> that >>>> could be useful for users working with multiple Calcite components, >> though >>>> I think the usability improvement provided by such BOM POM wouldn't be >>>> huge. >>>> >>>> I wanted to bring up a related matter though. Coming to Calcite as a >> user >>>> just recently (loving the possibilities it provides!), I was surprised >> by >>>> the large number of dependencies of the project. It looks like 1.29 >>>> improves that a little bit (no more kotlin-stdlib, no more transitive >>>> dependency to log4j 1.x), but the transitive hull of all dependencies of >>>> calcite-core still is quite big. I lack insight about what the different >>>> dependencies are used for; but as an application developer, Guava for >>>> instance is a dependency which I'd prefer to not get pushed onto the >>>> classpath transitively. Jackson is another heavy one; depending on how >> it's >>>> used, perhaps this could be pushed into some separate module which users >>>> could optionally pull in? That'd help to avoid having it around when >> users >>>> work with other JSON libs themselves and don't require JSON support in >>>> Calcite. >>>> >>>> From a supply chain perspective, the less transitive dependencies a >> library >>>> like Calcite introduces to my project, the better IMHO. Less potential >> for >>>> version conflicts with my own (or other transitive) dependencies, and >> also >>>> less potential for introducing CVEs to the dependency graph, as e.g. in >> the >>>> case of the Guava version currently used by Calcite; I suppose it does >> not >>>> impact the usage in Calcite, but these things tend to be tricky to >> reason >>>> about, and typical CVE reporting tooling will now create a warning for a >>>> project using Calcite, no matter whether that specific issue actually >> is a >>>> problem or not. >>>> >>>> Best, >>>> >>>> --Gunnar >>>> >>>> [1] >>>> >> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0 >>>> [2] >>>> >>>> >> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms >>>> >>>> >>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde < >>>> [email protected]>: >>>> >>>>> In the wake of the log4j CVEs [1], people are asking how to improve the >>>>> security of open source projects, and one idea is to provide a SBOM >>>>> (Software Bill of Materials) [2] along with each release. >>>>> >>>>> I had not heard of SBOM until a couple of days ago. Is anyone on this >>>> list >>>>> familiar with SBOMs and their use? Should Calcite be providing an SBOM? >>>> Are >>>>> people aware of SBOM initiatives in other projects? What, in your >>>> opinion, >>>>> is the priority of this issue? >>>>> >>>>> Julian >>>>> >>>>> [1] >>>>> >>>> >> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html >>>>> >>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials >>>>> >>>> >> >>
