Copilot commented on code in PR #2894:
URL: https://github.com/apache/tika/pull/2894#discussion_r3430547385
##########
tika-integration-tests/tika-pipes-solr-integration-tests/src/test/java/org/apache/tika/pipes/solr/tests/TikaPipesSolrTestBase.java:
##########
@@ -36,8 +36,8 @@
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.solr.client.solrj.SolrClient;
-import org.apache.solr.client.solrj.SolrQuery;
-import org.apache.solr.client.solrj.impl.Http2SolrClient;
+import org.apache.solr.client.solrj.jetty.HttpJettySolrClient;
+import org.apache.solr.client.solrj.request.SolrQuery;
Review Comment:
Same issue as in SolrPipesIterator: SolrQuery is usually provided as
org.apache.solr.client.solrj.SolrQuery. If
org.apache.solr.client.solrj.request.SolrQuery is not present in SolrJ 10.0.0,
the test module will not compile. Please verify and correct the import.
##########
tika-pipes/tika-pipes-plugins/tika-pipes-solr/src/main/java/org/apache/tika/pipes/iterator/solr/SolrPipesIterator.java:
##########
@@ -26,11 +26,12 @@
import java.util.concurrent.TimeoutException;
import org.apache.solr.client.solrj.SolrClient;
-import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudSolrClient;
-import org.apache.solr.client.solrj.impl.Http2SolrClient;
-import org.apache.solr.client.solrj.impl.LBHttpSolrClient;
+import org.apache.solr.client.solrj.impl.LBSolrClient;
+import org.apache.solr.client.solrj.jetty.HttpJettySolrClient;
+import org.apache.solr.client.solrj.jetty.LBJettySolrClient;
+import org.apache.solr.client.solrj.request.SolrQuery;
Review Comment:
The import for SolrQuery appears to be incorrect. In SolrJ, SolrQuery is
typically in org.apache.solr.client.solrj.SolrQuery (not
org.apache.solr.client.solrj.request.SolrQuery). If this new package does not
exist in SolrJ 10.0.0, this will fail compilation—please switch back to the
correct SolrQuery import (and update usages only if SolrJ truly moved the
class).
##########
tika-pipes/tika-pipes-plugins/tika-pipes-solr/src/main/java/org/apache/tika/pipes/emitter/solr/SolrEmitter.java:
##########
@@ -108,26 +109,42 @@ private static SolrClient
buildSolrClient(SolrEmitterConfig config) throws TikaC
if (config.solrUrls() == null || config.solrUrls().isEmpty()) {
// Use ZooKeeper-based CloudSolrClient
- Http2SolrClient.Builder http2SolrClientBuilder = new
Http2SolrClient.Builder();
- if (!StringUtils.isBlank(httpClientFactory.getUserName())) {
-
http2SolrClientBuilder.withBasicAuthCredentials(httpClientFactory.getUserName(),
httpClientFactory.getPassword());
- }
- http2SolrClientBuilder
+ HttpJettySolrClient.Builder jettyClientBuilder = new
HttpJettySolrClient.Builder();
+ applyAuthAndProxy(jettyClientBuilder, httpClientFactory,
config.proxyHost(), config.proxyPort());
+ jettyClientBuilder
.withRequestTimeout(httpClientFactory.getRequestTimeoutMillis(),
TimeUnit.MILLISECONDS)
.withConnectionTimeout(config.getConnectionTimeoutMillisOrDefault(),
TimeUnit.MILLISECONDS);
- Http2SolrClient http2SolrClient = http2SolrClientBuilder.build();
return new CloudSolrClient.Builder(config.solrZkHosts(),
Optional.ofNullable(config.solrZkChroot()))
- .withHttpClient(http2SolrClient)
+ .withInternalClientBuilder(jettyClientBuilder)
.build();
} else {
- // Use direct URL-based LBHttpSolrClient
- return new LBHttpSolrClient.Builder()
+ // Use direct URL-based LBJettySolrClient
+ HttpJettySolrClient.Builder jettyClientBuilder = new
HttpJettySolrClient.Builder();
+ applyAuthAndProxy(jettyClientBuilder, httpClientFactory,
config.proxyHost(), config.proxyPort());
+ jettyClientBuilder
+
.withRequestTimeout(httpClientFactory.getRequestTimeoutMillis(),
TimeUnit.MILLISECONDS)
.withConnectionTimeout(config.getConnectionTimeoutMillisOrDefault(),
TimeUnit.MILLISECONDS)
-
.withSocketTimeout(config.getSocketTimeoutMillisOrDefault(),
TimeUnit.MILLISECONDS)
- .withHttpClient(httpClientFactory.build())
- .withBaseEndpoints(config.solrUrls().toArray(new
String[]{}))
- .build();
+ .withIdleTimeout(config.getSocketTimeoutMillisOrDefault(),
TimeUnit.MILLISECONDS);
+ HttpJettySolrClient jettyClient = jettyClientBuilder.build();
+ LBSolrClient.Endpoint[] endpoints = config.solrUrls().stream()
+ .map(LBSolrClient.Endpoint::new)
+ .toArray(LBSolrClient.Endpoint[]::new);
+ return new LBJettySolrClient.Builder(jettyClient,
endpoints).build();
+ }
+ }
+
+ private static void applyAuthAndProxy(HttpJettySolrClient.Builder builder,
HttpClientFactory factory,
+ String proxyHost, Integer proxyPort)
throws TikaConfigException {
+ if (!StringUtils.isBlank(factory.getUserName())) {
+ if (!"basic".equalsIgnoreCase(factory.getAuthScheme())) {
+ throw new TikaConfigException("Only 'basic' auth scheme is
supported by HttpJettySolrClient; got: '"
+ + factory.getAuthScheme() + "'");
+ }
+ builder.withBasicAuthCredentials(factory.getUserName(),
factory.getPassword());
+ }
+ if (!StringUtils.isBlank(proxyHost) && proxyPort != null && proxyPort
> 0) {
+ builder.withProxyConfiguration(proxyHost, proxyPort, false, false);
}
}
Review Comment:
The new applyAuthAndProxy logic is duplicated (very similarly) in both
SolrEmitter and SolrPipesIterator. To reduce drift and future incompatibilities
(e.g., adding SSL/proxy auth support), consider extracting a shared helper
(e.g., a small utility in the solr pipes module) that applies auth/proxy
consistently for both emitter and iterator.
##########
tika-server/tika-server-core/pom.xml:
##########
@@ -95,7 +95,11 @@
</dependency>
<dependency>
<groupId>org.eclipse.jetty.http2</groupId>
- <artifactId>http2-server</artifactId>
+ <artifactId>jetty-http2-server</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>jakarta.servlet</groupId>
+ <artifactId>jakarta.servlet-api</artifactId>
</dependency>
Review Comment:
jakarta.servlet-api is an API-only dependency that is typically provided by
the runtime/container. If tika-server-core is intended to run with a servlet
container (or includes Jetty modules that already provide the servlet API
transitively), consider setting this dependency's scope to provided to avoid
bundling/duplicating servlet classes on the classpath.
##########
tika-pipes/tika-pipes-plugins/tika-pipes-solr/src/main/java/org/apache/tika/pipes/iterator/solr/SolrPipesIterator.java:
##########
@@ -180,27 +181,40 @@ private SolrClient createSolrClient() throws
TikaConfigException {
List<String> solrZkHosts = config.getSolrZkHosts() != null ?
config.getSolrZkHosts() : Collections.emptyList();
if (solrUrls.isEmpty()) {
- //TODO -- there's more that we need to pass through, including ssl
etc.
- Http2SolrClient.Builder http2SolrClientBuilder = new
Http2SolrClient.Builder();
- if (!StringUtils.isBlank(httpClientFactory.getUserName())) {
-
http2SolrClientBuilder.withBasicAuthCredentials(httpClientFactory.getUserName(),
httpClientFactory.getPassword());
- }
- http2SolrClientBuilder
+ HttpJettySolrClient.Builder jettyClientBuilder = new
HttpJettySolrClient.Builder();
+ applyAuthAndProxy(jettyClientBuilder, httpClientFactory);
+ jettyClientBuilder
.withRequestTimeout(httpClientFactory.getRequestTimeoutMillis(),
TimeUnit.MILLISECONDS)
.withConnectionTimeout(config.getConnectionTimeoutMillis(),
TimeUnit.MILLISECONDS);
-
- Http2SolrClient http2SolrClient = http2SolrClientBuilder.build();
return new CloudSolrClient.Builder(solrZkHosts,
Optional.ofNullable(config.getSolrZkChroot()))
- .withHttpClient(http2SolrClient)
+ .withInternalClientBuilder(jettyClientBuilder)
.build();
+ }
+ HttpJettySolrClient.Builder jettyClientBuilder = new
HttpJettySolrClient.Builder();
+ applyAuthAndProxy(jettyClientBuilder, httpClientFactory);
+ jettyClientBuilder
+
.withRequestTimeout(httpClientFactory.getRequestTimeoutMillis(),
TimeUnit.MILLISECONDS)
+ .withConnectionTimeout(config.getConnectionTimeoutMillis(),
TimeUnit.MILLISECONDS)
+ .withIdleTimeout(config.getSocketTimeoutMillis(),
TimeUnit.MILLISECONDS);
+ HttpJettySolrClient jettyClient = jettyClientBuilder.build();
+ LBSolrClient.Endpoint[] endpoints = solrUrls.stream()
+ .map(LBSolrClient.Endpoint::new)
+ .toArray(LBSolrClient.Endpoint[]::new);
+ return new LBJettySolrClient.Builder(jettyClient, endpoints).build();
+ }
+ private static void applyAuthAndProxy(HttpJettySolrClient.Builder builder,
+ HttpClientFactory factory) throws
TikaConfigException {
+ if (!StringUtils.isBlank(factory.getUserName())) {
+ if (!"basic".equalsIgnoreCase(factory.getAuthScheme())) {
+ throw new TikaConfigException("Only 'basic' auth scheme is
supported by HttpJettySolrClient; got: '"
+ + factory.getAuthScheme() + "'");
+ }
+ builder.withBasicAuthCredentials(factory.getUserName(),
factory.getPassword());
+ }
+ if (!StringUtils.isBlank(factory.getProxyHost()) &&
factory.getProxyPort() > 0) {
+ builder.withProxyConfiguration(factory.getProxyHost(),
factory.getProxyPort(), false, false);
}
Review Comment:
The call to withProxyConfiguration uses two boolean literals (false, false),
which makes the intent unclear and is easy to misuse. Consider wrapping this in
a small named method (or using an overload/enum-like configuration if
available) so the meaning of these flags is explicit at the call site.
##########
tika-grpc/src/test/java/org/apache/tika/pipes/grpc/PipesBiDirectionalStreamingIntegrationTest.java:
##########
@@ -92,9 +91,7 @@ static void setUpHttpServer() throws Exception {
ResourceHandler resourceHandler = new ResourceHandler();
resourceHandler.setDirAllowed(true);
- // TODO when using jetty 12:
- //
resourceHandler.setBaseResourceAsString("src/test/resources/test-files")
- resourceHandler.setBaseResource(new PathResource(Paths.get("src",
"test", "resources", "test-files")));
+
resourceHandler.setBaseResourceAsString("src/test/resources/test-files");
Review Comment:
Using a hard-coded relative path string for the resource base can be brittle
in multi-module builds where the working directory may not be the module root.
To make the test more reliable across CI/IDE runs, consider resolving the path
via Paths.get(...).toAbsolutePath().toString() (or loading the directory via
the classpath and converting to a file/resource if Jetty requires a filesystem
path).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]