[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532136#comment-14532136 ] Bertrand Delacretaz commented on SLING-4694: I'm think of ContentAwareMimeTypeService as a better way to implement this, which is reusable in other contexts. We might use a detectionMode service property to make several variants of ContentAwareMimeTypeService available (filename-only, using Tika, etc.) and clients (like the webdav servlet) can then select one based on that property. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532177#comment-14532177 ] Satya Deep Maheshwari commented on SLING-4694: -- I like this idea. One problem that I see with this though is that if the current SlingTikaDetector uses ContentAwareMimeTypeService and ContentAwareMimeTypeService would need a TikaDetector itself to figure out the mime-type. So we have 2 detectors coming into picture to figure out the mime-type. And these need to be different ones (how?) to avoid a recursive situation. I would want to have MimeTypeService configurable as you suggest but perhaps also make SlingWebdavServlet configurable for the detector for this special case wherein MimeTypeService is itself being used by a TikaDetector i.e SlingTikaDetector. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532196#comment-14532196 ] Bertrand Delacretaz commented on SLING-4694: SlingTikaDetector uses Tika APIs but it can be just a bridge for the new ContentAwareMimeTypeService, with no actual detection logic. The current SlingTikaDetector tries to extract a filename from the Metadata.RESOURCE_NAME_KEY String, which should also IMO be a service (FilenameExtractor) provided by the Sling mimetype bundle. SlingTikaDetector also falls back to the Tika MediaType.parse(..) method if the Sling MimeTypeService does not find a mime type, this can also be a feature of a Tika-based ContentAwareMimeTypeService instead. To avoid any risks of Tika API incompatibilities, I would embed the required Tika classes in the webdav bundle, and make the ContentAwareMimeTypeService API independent of Tika. Note that we must make sure the affected webdav bundle code has sufficient test coverage in this area before making any changes, to make sure those changes stay backwards compatible. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532217#comment-14532217 ] Oliver Lietz commented on SLING-4694: - The content/mime type detector should really be general so it could be used in other bundles. I've started work on a detector in content loader, but have currently no time to continue: {noformat} public interface ContentTypeDetector { String detectContentType(InputStream contentStream, String filename); } {noformat} [~tomek.rekawek], this classes should be removed as they are not used right now. Do you still work on content loader and clean up further? SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532263#comment-14532263 ] Tomek Rękawek commented on SLING-4694: -- I finished my work on the content loader (my main purpose was to provide the pluggable readers), but I can help cleaning the module up. Do you want me to remove the ContentTypeDetector? SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532132#comment-14532132 ] Satya Deep Maheshwari commented on SLING-4694: -- It would be generally useful IMO to create an extended MimeTypeService [~bdelacretaz] are you suggesting this in addition to making TikaDetector configurable in WebDav servlet or an alternative way of implementing this? SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532223#comment-14532223 ] Bertrand Delacretaz commented on SLING-4694: bq. The content/mime type detector should really be general so it could be used in other bundles. Yes that's the idea of my {{ContentAwareMimeTypeService}} suggestion. Best is probably to create a new commons/content-detection bundle that provides those enhanced APIs. I won't have time to work on this in the next few days, so just throwing ideas around ATM. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532209#comment-14532209 ] Satya Deep Maheshwari commented on SLING-4694: -- SlingTikaDetector uses Tika APIs but it can be just a bridge for the new ContentAwareMimeTypeService, with no actual detection logic. Makes sense. This clarifies things substantially. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532472#comment-14532472 ] Oliver Lietz commented on SLING-4694: - Done. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532399#comment-14532399 ] Satya Deep Maheshwari commented on SLING-4694: -- I'll try to come up with the implementation of preContentAwareMimeTypeService/pre as discussed. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Affects Versions: JCR Webdav 2.2.2 Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530414#comment-14530414 ] Satya Deep Maheshwari commented on SLING-4694: -- I am working on an approach wherein the Sling webdav servlet would be passed a tika detector via @Reference . The current SlingTikaDetector would also expose itself as a service and the webdav servlet will get its reference for further use. Alternatively, one can pass another tika detector to the webdav servlet by exposing it as a service. Sling quickstart already includes a Tika Osgi bundle which provides the default TikaDetector which can be used as an alternative to the internal SlingTikaDetector . SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector
[ https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530420#comment-14530420 ] Bertrand Delacretaz commented on SLING-4694: It would be generally useful IMO to create an extended {{MimeTypeService}} to be able do to content-based mime type detection. Maybe something like {code} public interface ContentAwareMimeTypeService extends MimeTypeService { /** @param filename used if content is null or if this service does not support content-based detection * @param content optional stream that points to the content to analyze * (TODO explain any relevant constraints on that stream, does it need to support mark/reset etc) */ String getMimeType(String filename, InputStream content); } {code} The webdav code can then use this if available, preferring it to the basic MimeTypeService. SlingWebDavServlet should have a configurable Tika detector --- Key: SLING-4694 URL: https://issues.apache.org/jira/browse/SLING-4694 Project: Sling Issue Type: Improvement Components: Servlets Reporter: Satya Deep Maheshwari *Problem description:* I am facing a problem with the mime type detection of a file. While debugging, I see that SlingTikaDetector.detect method is used for detecting the mime type of my file. See [1]. This method just seems to rely on the name of the file for detecting its mime type. Even though its passed an inputstream of the file, it does not seem to use it for mime type detection. So if my file name is something like xyz.tmp, it detects its mime type as application/octet-stream (the default) while it may actually be a png file. This is a common scenario with webdav clients wherein temporary files get created with such names while being edited. *Suggested Solution:* Quoting [~rombert] {quote} Following the discussions at SLING-1059 [1] and SLING-255 [2] I can infer that we more or less opted out of the 'heavy-weight' approach of actually parsing the input stream. Not sure if we want to revisit that TBH. At any rate, our MimeTypeService does not have an API for getting the file content based on the input stream. I think though there's a way around it, but only at the code level. The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class hardcodes the Detector implementation to be a SlingTikaDetector. I think it is worthwile to raise a Jira issue for this and it would definitely expedite the fix if you're willing to submit a patch / pull request. I think it can be as simple as adding a @Reference to a Tika Detector to the SlingWebDavServlet and then passing that to the SlingServletConfig. Cheers, Robert [1]: https://issues.apache.org/jira/browse/SLING-1059 [2]: https://issues.apache.org/jira/browse/SLING-255 {quote} Related mailing-list thread on this: http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)