Re: [PATCH 1/3] PM / devfreq: Actually support providing freq_table
On Mon 23 Apr 19:48 PDT 2018, Chanwoo Choi wrote: > Hi, > > On 2018??? 04??? 24??? 09:20, Bjorn Andersson wrote: > > The code in devfreq_add_device() handles the case where a freq_table is > > passed by the client, but then requests min and max frequences from > > the, in this case absent, opp tables. > > > > Read the min and max frequencies from the frequency table, which has > > been built from the opp table if one exists, instead of querying the > > opp table. > > > > Signed-off-by: Bjorn Andersson> > --- > > > > An alternative approach is to clarify in the devfreq code that it's not > > possible to pass a freq_table and then in patch 3 create an opp table for > > the > > device in runtime; although the error handling of this becomes non-trivial. > > > > Transitioning the UFSHCD to use opp tables directly is hindered by the fact > > that the Qualcomm UFS hardware has two different clocks that needs to be > > running at different rates, so we would need a way to describe the two > > rates in > > the opp table. (And would force us to change the DT binding) > > > > drivers/devfreq/devfreq.c | 22 -- > > 1 file changed, 4 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > > index fe2af6aa88fc..086ced50a13d 100644 > > --- a/drivers/devfreq/devfreq.c > > +++ b/drivers/devfreq/devfreq.c > > @@ -74,30 +74,16 @@ static struct devfreq *find_device_devfreq(struct > > device *dev) > > > > static unsigned long find_available_min_freq(struct devfreq *devfreq) > > { > > - struct dev_pm_opp *opp; > > - unsigned long min_freq = 0; > > - > > - opp = dev_pm_opp_find_freq_ceil(devfreq->dev.parent, _freq); > > - if (IS_ERR(opp)) > > - min_freq = 0; > > - else > > - dev_pm_opp_put(opp); > > + struct devfreq_dev_profile *profile = devfreq->profile; > > > > - return min_freq; > > + return profile->freq_table[0]; > > It is wrong. The thermal framework support the devfreq-cooling device > which uses the dev_pm_opp_enable/disable(). > Okay, that makes sense. So rather than registering a custom freq_table I should register the min and max frequency using dev_pm_opp_add(). > In order to find the correct available min frequency, > the devfreq have to use the OPP function instead of using the first entry > of the freq_table array. > Based on this there seems to be room for cleaning out the freq_table from devfreq, to reduce the confusion. I will review this further. Thanks, Bjorn
Re: [PATCH 1/3] PM / devfreq: Actually support providing freq_table
Hi, On 2018년 04월 24일 09:20, Bjorn Andersson wrote: > The code in devfreq_add_device() handles the case where a freq_table is > passed by the client, but then requests min and max frequences from > the, in this case absent, opp tables. > > Read the min and max frequencies from the frequency table, which has > been built from the opp table if one exists, instead of querying the > opp table. > > Signed-off-by: Bjorn Andersson> --- > > An alternative approach is to clarify in the devfreq code that it's not > possible to pass a freq_table and then in patch 3 create an opp table for the > device in runtime; although the error handling of this becomes non-trivial. > > Transitioning the UFSHCD to use opp tables directly is hindered by the fact > that the Qualcomm UFS hardware has two different clocks that needs to be > running at different rates, so we would need a way to describe the two rates > in > the opp table. (And would force us to change the DT binding) > > drivers/devfreq/devfreq.c | 22 -- > 1 file changed, 4 insertions(+), 18 deletions(-) > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > index fe2af6aa88fc..086ced50a13d 100644 > --- a/drivers/devfreq/devfreq.c > +++ b/drivers/devfreq/devfreq.c > @@ -74,30 +74,16 @@ static struct devfreq *find_device_devfreq(struct device > *dev) > > static unsigned long find_available_min_freq(struct devfreq *devfreq) > { > - struct dev_pm_opp *opp; > - unsigned long min_freq = 0; > - > - opp = dev_pm_opp_find_freq_ceil(devfreq->dev.parent, _freq); > - if (IS_ERR(opp)) > - min_freq = 0; > - else > - dev_pm_opp_put(opp); > + struct devfreq_dev_profile *profile = devfreq->profile; > > - return min_freq; > + return profile->freq_table[0]; It is wrong. The thermal framework support the devfreq-cooling device which uses the dev_pm_opp_enable/disable(). In order to find the correct available min frequency, the devfreq have to use the OPP function instead of using the first entry of the freq_table array. > } > > static unsigned long find_available_max_freq(struct devfreq *devfreq) > { > - struct dev_pm_opp *opp; > - unsigned long max_freq = ULONG_MAX; > - > - opp = dev_pm_opp_find_freq_floor(devfreq->dev.parent, _freq); > - if (IS_ERR(opp)) > - max_freq = 0; > - else > - dev_pm_opp_put(opp); > + struct devfreq_dev_profile *profile = devfreq->profile; > > - return max_freq; > + return profile->freq_table[profile->max_state - 1]; > } ditto. > > /** > -- Best Regards, Chanwoo Choi Samsung Electronics
[PATCH 0/3] Fix UFS and devfreq interaction
With the introduction of f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency") the UFS host controller driver (UFSHCD) stopped probing for platforms that supports frequency scaling, e.g. all modern Qualcomm platforms. The cause of this was UFSHCD's reliance of not registering any frequencies and then being called by devfreq to switch between the frequencies 0 and UINT_MAX. The devfreq code implies that the client is able to pass the frequency table, instead of relying on opp tables, so the first patch makes this actually work. The second patch extracts the devfreq registration in the UFSHCD driver, both to facilitate the third patch and to remove a dereference of an ERR_PTR() in the case that devfreq registration fails. Finally, the third patch picks the two frequencies from the freq-table provided in UFSHCD and pass these to devfreq, as well as map these frequencies back to the step up/down actions. With this UFS is once again functional on the db820c, and is needed to get UFS working on SDM845 (both tested). Bjorn Andersson (3): PM / devfreq: Actually support providing freq_table scsi: ufs: Extract devfreq registration scsi: ufs: Use freq table with devfreq drivers/devfreq/devfreq.c | 22 +++ drivers/scsi/ufs/ufshcd.c | 68 --- 2 files changed, 57 insertions(+), 33 deletions(-) -- 2.16.2
[PATCH 2/3] scsi: ufs: Extract devfreq registration
Failing to register with devfreq leaves hba->devfreq assigned, which causes the error path to dereference the ERR_PTR(). Rather than bolting on more conditionals, move the call of devm_devfreq_add_device() into it's own function and only update hba->devfreq once it's successfully registered. The subsequent patch builds upon this to make UFS actually work again, as it's been broken since f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency") Signed-off-by: Bjorn Andersson--- drivers/scsi/ufs/ufshcd.c | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 8f22a980b1a7..2253f24309ec 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1255,6 +1255,26 @@ static struct devfreq_dev_profile ufs_devfreq_profile = { .get_dev_status = ufshcd_devfreq_get_dev_status, }; +static int ufshcd_devfreq_init(struct ufs_hba *hba) +{ + struct devfreq *devfreq; + int ret; + + devfreq = devm_devfreq_add_device(hba->dev, + _devfreq_profile, + "simple_ondemand", + NULL); + if (IS_ERR(devfreq)) { + ret = PTR_ERR(devfreq); + dev_err(hba->dev, "Unable to register with devfreq %d\n", ret); + return ret; + } + + hba->devfreq = devfreq; + + return 0; +} + static void __ufshcd_suspend_clkscaling(struct ufs_hba *hba) { unsigned long flags; @@ -6399,16 +6419,9 @@ static int ufshcd_probe_hba(struct ufs_hba *hba) sizeof(struct ufs_pa_layer_attr)); hba->clk_scaling.saved_pwr_info.is_valid = true; if (!hba->devfreq) { - hba->devfreq = devm_devfreq_add_device(hba->dev, - _devfreq_profile, - "simple_ondemand", - NULL); - if (IS_ERR(hba->devfreq)) { - ret = PTR_ERR(hba->devfreq); - dev_err(hba->dev, "Unable to register with devfreq %d\n", - ret); + ret = ufshcd_devfreq_init(hba); + if (ret) goto out; - } } hba->clk_scaling.is_allowed = true; } -- 2.16.2
[PATCH 3/3] scsi: ufs: Use freq table with devfreq
devfreq requires that the client operates on actual frequencies, not only 0 and UMAX_INT and as such UFS brok with the introduction of f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency"). This patch registers the frequencies of the first clock with devfreq and use these to determine if we're trying to step up or down. Signed-off-by: Bjorn Andersson--- drivers/scsi/ufs/ufshcd.c | 39 --- 1 file changed, 32 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 2253f24309ec..07b1f3c7bd2d 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1168,16 +1168,13 @@ static int ufshcd_devfreq_target(struct device *dev, struct ufs_hba *hba = dev_get_drvdata(dev); ktime_t start; bool scale_up, sched_clk_scaling_suspend_work = false; + struct list_head *clk_list = >clk_list_head; + struct ufs_clk_info *clki; unsigned long irq_flags; if (!ufshcd_is_clkscaling_supported(hba)) return -EINVAL; - if ((*freq > 0) && (*freq < UINT_MAX)) { - dev_err(hba->dev, "%s: invalid freq = %lu\n", __func__, *freq); - return -EINVAL; - } - spin_lock_irqsave(hba->host->host_lock, irq_flags); if (ufshcd_eh_in_progress(hba)) { spin_unlock_irqrestore(hba->host->host_lock, irq_flags); @@ -1187,7 +1184,13 @@ static int ufshcd_devfreq_target(struct device *dev, if (!hba->clk_scaling.active_reqs) sched_clk_scaling_suspend_work = true; - scale_up = (*freq == UINT_MAX) ? true : false; + if (list_empty(clk_list)) { + spin_unlock_irqrestore(hba->host->host_lock, irq_flags); + goto out; + } + + clki = list_first_entry(>clk_list_head, struct ufs_clk_info, list); + scale_up = (*freq == clki->max_freq) ? true : false; if (!ufshcd_is_devfreq_scaling_required(hba, scale_up)) { spin_unlock_irqrestore(hba->host->host_lock, irq_flags); ret = 0; @@ -1257,11 +1260,33 @@ static struct devfreq_dev_profile ufs_devfreq_profile = { static int ufshcd_devfreq_init(struct ufs_hba *hba) { + struct devfreq_dev_profile *profile; + struct list_head *clk_list = >clk_list_head; + struct ufs_clk_info *clki; struct devfreq *devfreq; int ret; + /* Skip devfreq if we don't have any clocks in the list */ + if (list_empty(clk_list)) + return 0; + + profile = devm_kmemdup(hba->dev, _devfreq_profile, + sizeof(ufs_devfreq_profile), GFP_KERNEL); + if (!profile) + return -ENOMEM; + + profile->max_state = 2; + profile->freq_table = devm_kcalloc(hba->dev, profile->max_state, + sizeof(unsigned long), GFP_KERNEL); + if (!profile->freq_table) + return -ENOMEM; + + clki = list_first_entry(>clk_list_head, struct ufs_clk_info, list); + profile->freq_table[0] = clki->min_freq; + profile->freq_table[1] = clki->max_freq; + devfreq = devm_devfreq_add_device(hba->dev, - _devfreq_profile, + profile, "simple_ondemand", NULL); if (IS_ERR(devfreq)) { -- 2.16.2
[PATCH 1/3] PM / devfreq: Actually support providing freq_table
The code in devfreq_add_device() handles the case where a freq_table is passed by the client, but then requests min and max frequences from the, in this case absent, opp tables. Read the min and max frequencies from the frequency table, which has been built from the opp table if one exists, instead of querying the opp table. Signed-off-by: Bjorn Andersson--- An alternative approach is to clarify in the devfreq code that it's not possible to pass a freq_table and then in patch 3 create an opp table for the device in runtime; although the error handling of this becomes non-trivial. Transitioning the UFSHCD to use opp tables directly is hindered by the fact that the Qualcomm UFS hardware has two different clocks that needs to be running at different rates, so we would need a way to describe the two rates in the opp table. (And would force us to change the DT binding) drivers/devfreq/devfreq.c | 22 -- 1 file changed, 4 insertions(+), 18 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index fe2af6aa88fc..086ced50a13d 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -74,30 +74,16 @@ static struct devfreq *find_device_devfreq(struct device *dev) static unsigned long find_available_min_freq(struct devfreq *devfreq) { - struct dev_pm_opp *opp; - unsigned long min_freq = 0; - - opp = dev_pm_opp_find_freq_ceil(devfreq->dev.parent, _freq); - if (IS_ERR(opp)) - min_freq = 0; - else - dev_pm_opp_put(opp); + struct devfreq_dev_profile *profile = devfreq->profile; - return min_freq; + return profile->freq_table[0]; } static unsigned long find_available_max_freq(struct devfreq *devfreq) { - struct dev_pm_opp *opp; - unsigned long max_freq = ULONG_MAX; - - opp = dev_pm_opp_find_freq_floor(devfreq->dev.parent, _freq); - if (IS_ERR(opp)) - max_freq = 0; - else - dev_pm_opp_put(opp); + struct devfreq_dev_profile *profile = devfreq->profile; - return max_freq; + return profile->freq_table[profile->max_state - 1]; } /** -- 2.16.2
[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete
https://bugzilla.kernel.org/show_bug.cgi?id=199435 --- Comment #15 from Don (don.br...@microsemi.com) --- (In reply to Anthony Hausman from comment #11) > The only patch that I'm sure that I have is the "scsi: hpsa: fix selection > of reply queue" one. > For the I'm using an out of the box 4.11 kernel. So I'm really not sure that > the other patches are present. > > > Unfortunately, the module does not compile using 4.11.0-14-generic headers. > > # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd) > --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt" > make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic' > make -C /lib/modules/4.4.0-96-generic/build > M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules > make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic' > make[2]: *** No rule to make target 'kernel/bounds.c', needed by > 'kernel/bounds.s'. Stop. > Makefile:1423: recipe for target > '_module_/usr/src/linux-headers-4.11.0-14-generic' failed > make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2 > make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic' > /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for > target 'default' failed > make: *** [default] Error 2 > make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic' > > But if you tell me the principal problem is using the 4.11 kernel, I can > upgrade it to use the 4.16.3 kernel. > > If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your > precedent patch on the last 3.4.20-125? (In reply to Anthony Hausman from comment #11) > The only patch that I'm sure that I have is the "scsi: hpsa: fix selection > of reply queue" one. > For the I'm using an out of the box 4.11 kernel. So I'm really not sure that > the other patches are present. > > > Unfortunately, the module does not compile using 4.11.0-14-generic headers. > > # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd) > --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt" > make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic' > make -C /lib/modules/4.4.0-96-generic/build > M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules > make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic' > make[2]: *** No rule to make target 'kernel/bounds.c', needed by > 'kernel/bounds.s'. Stop. > Makefile:1423: recipe for target > '_module_/usr/src/linux-headers-4.11.0-14-generic' failed > make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2 > make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic' > /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for > target 'default' failed > make: *** [default] Error 2 > make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic' > > But if you tell me the principal problem is using the 4.11 kernel, I can > upgrade it to use the 4.16.3 kernel. > > If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your > precedent patch on the last 3.4.20-125? The 4.16.3 driver should be OK to use. You could not untar the sources I gave you in /tmp and build with make -f Makefile.alt? If you copy the source code into the kernel tree, you should be able to do make modules SUBDIRS=drivers/scsi hpsa.ko -- You are receiving this mail because: You are the assignee for the bug.
God dag.
God dag, Jeg er administrerende direktør i industri- og handelsbanken i Kina (ICBC). Jeg har et gjensidig forretningsforslag, som refererer til overføring av a stor mengde penger til en konto i utlandet, med din hjelp som utlending partner som mottaker av midlene. Alt om denne transaksjonen vil bli gjort lovlig uten noen bro av økonomisk myndighet både i min Land og ditt. Hvis du er interessert, vennligst svar tilbake gjennom min privat e-post skrevet nedenfor, og jeg vil gi deg mer informasjon og prosjekt så snart jeg mottar det positive svaret ditt. Privat e-post: direc...@chengfengchao.com Med vennlig hilsen, Daglig leder. ICBC. Kina
Re: [PATCH] bsg referencing bus driver module
Thanks, James. The idea of cutting communications with Scsi_Host at bsg_unregister_queue(..) time and leaving bsg_class_device to its own fate makes a lot of sense, conceptually. But there are implementation issues that are difficult to work around. bsg.c creates bsg_class_device and takes a reference to Scsi_Host at bsg_register_queue(..) time. The reference is dropped at bsg_class_device's release(..) function. If the driver implementing Scsi_Host template is not around we crash. We could move the reference drop from bsg_class_device's release(..) function to bsg_unregister_queue(..). That would be a small change in bsg.c. But bsg.c sets Scsi_Host as the parent of bsg_class_device's device. We cannot have a device around with a dangling parent. A device's parent cannot be changed dynamically. Not setting the device's parent at creation may affect software relying on bsg_class_device - Scsi_Host child-parent relations. It looks like I am out of options. Do you have suggestions on how to work around Scsi_Host being bsg_class_device's parent?
RE: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v4)
Hi, We are running kernel 4.4.0-22 and the patch below does not seem to be present in the mpt3sas driver. Can you please confirm? As a reminder the patch was related to a Security Erase ATA command that requires a very long timeout like 100 minutes or more and the drive retains a busy status. And the driver should not try to send other commands or reset the drive. Thanks, Igor Rybak CTO MediaClone Inc 6900 Canby Ave Ste 107 Reseda, CA 91335 USA +1-818-654-6286 From: Sreekanth Reddy [sreekanth.re...@broadcom.com] Sent: Thursday, November 10, 2016 8:38 PM To: Andrey Grodzovsky Cc: PDL-MPT-FUSIONLINUX; Igor Rybak; Ezra Kohavi; linux-scsi@vger.kernel.org; Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; Hannes Reinecke; sta...@vger.kernel.org Subject: Re: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v4) On Thu, Nov 10, 2016 at 8:05 PM, Andrey Grodzovskywrote: > Problem: > This is a work around for a bug with LSI Fusion MPT SAS2 when > pefroming secure erase. Due to the very long time the operation > takes commands issued during the erase will time out and will trigger > execution of abort hook. Even though the abort hook is called for > the specific command which timed out this leads to entire device halt > (scsi_state terminated) and premature termination of the secured erase. > > Fix: > Set device state to busy while erase in progress to reject any incoming > commands until the erase is done. The device is blocked any way during > this time and cannot execute any other command. > More data and logs can be found here - > https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view > > v2: Update according to example patch by Hannes Reinecke to apply > the blocking logic to any ATA 12/16 command. > > v3: Use SCSI commands opcodes definitions instead of value and > correct identation. > > v4: Fix checkpath errors and warning. > > Signed-off-by: Andrey Grodzovsky > Cc: > Cc: Sathya Prakash > Cc: Chaitra P B > Cc: Suganath Prabu Subramani > Cc: Sreekanth Reddy > Cc: Hannes Reinecke > Cc: Acked-by: Sreekanth Reddy > --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index 5a97e32..c032319 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -3500,6 +3500,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 > ioc_status) > SAM_STAT_CHECK_CONDITION; > } > > +static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) > +{ > + return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); > +} > > /** > * _scsih_qcmd - main scsi request entry point > @@ -3528,6 +3532,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd > *scmd) > scsi_print_command(scmd); > #endif > > + /** > + * Lock the device for any subsequent command until > + * command is done. > + */ > + if (ata_12_16_cmd(scmd)) > + scsi_internal_device_block(scmd->device); > + > + > sas_device_priv_data = scmd->device->hostdata; > if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { > scmd->result = DID_NO_CONNECT << 16; > @@ -4062,6 +4074,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, > u8 msix_index, u32 reply) > if (scmd == NULL) > return 1; > > + if (ata_12_16_cmd(scmd)) > + scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); > + > + > mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); > > if (mpi_reply == NULL) { > -- > 2.1.4 >
Re: [RESEND PATCH v1 2/2] trace: events: block: Add tag in block trace events
On Mon, 23 Apr 2018 14:43:13 +0200 Steffen Maierwrote: > > - TP_printk("[%s] %d", __entry->comm, __entry->nr_rq) > > + TP_printk("[%s] %d %s", __entry->comm, __entry->nr_rq, > > + __entry->explicit ? "Sync" : "Async") > > ); > > > > /** > > This entire hunk does not seem related to this patch description. > Also, I'm not sure trace-cmd and perf et al. could format it accordingly. You mean the "?:" operation? trace-cmd and perf can handle it fine. Just look at the trace event irq_handler_exit: print fmt: "irq=%d ret=%s", REC->irq, REC->ret ? "handled" : "unhandled" # trace-cmd record -e irq_handler_exit # trace-cmd report -0 [001] 856960.382767: irq_handler_exit: irq=29 ret=handled -0 [001] 856961.745640: irq_handler_exit: irq=29 ret=handled -0 [001] 856961.865762: irq_handler_exit: irq=29 ret=handled -- Steve
Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events
On 04/17/2018 12:00 PM, Bean Huo (beanhuo) wrote: #Cat trace iozone-4055 [000] 665.039276: block_unplug: [iozone] 1 Sync iozone-4055 [000] ...1 665.039278: block_rq_insert: 8,48 WS 0 () 39604352 + 128 tag=18 [iozone] iozone-4055 [000] ...1 665.039280: block_rq_issue: 8,48 WS 0 () 39604352 + 128 tag=18 [iozone] iozone-4055 [000] ...1 665.039284: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=3 data_sgl=16 prot_sgl=0 prot_op=SCSI_PROT_NORMAL tag=18 cmnd=(WRITE_10 lba=4950544 txlen=16 protect=0 raw=2a 00 00 4b 8a 10 00 00 10 00) iozone-4056 [002] 665.039284: block_dirty_buffer: 8,62 sector=44375 size=4096 -0 [000] d.h2 665.039319: scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=3 data_sgl=16 prot_sgl=0 prot_op=SCSI_PROT_NORMAL tag=24 cmnd=(WRITE_10 lba=4944016 txlen=16 protect=0 raw=2a 00 00 4b 70 90 00 00 10 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD) -0 [000] d.h3 665.039321: block_rq_complete: 8,48 WS () 39552128 + 128 tag=24 [0] iozone-4058 [003] 665.039362: block_bio_remap: 8,48 WS 39568768 + 128 <- (8,62) 337280 iozone-4058 [003] 665.039364: block_bio_queue: 8,48 WS 39568768 + 128 [iozone] iozone-4058 [003] ...1 665.039366: block_getrq: 8,48 WS 39568768 + 128 [iozone] I'm not familiar with block/scsi command tagging. Some block events now would get a tag field. Some block events would not get a tag field (maybe because for some the tag is not (yet) known). So all block events that belong to the same request would still need to be correlated by something like (devt, RWBS, LBA, length) because not all have a tag field. Especially, the ftrace log with tag information, I can easily figure out one I/O request between block layer and SCSI. Provided this is done correctly, I would be in favor of a solution. Since v4.11 commit 48b77ad60844 (``block: cleanup tracing'')\newline v4.11 commit 82ed4db499b8 (``block: split scsi\_request out of struct request'') we don't have the SCSI CDB in block traces (nor in blktrace traditional relayfs trace format, nor in ftrace 'blk' tracer binary synthesized output [1]) any more (empty Packet Command payload). Being able to correlate block events with scsi events would indeed be very helpful for some cases. Is a correlation between block and scsi only necessary for these pairs?: block_rq_issue causes scsi_dispatch_cmd_start, and scsi_dispatch_cmd_done causes block_rq_complete. If so, only those two block trace events would need to get a new field? [1] v2.6.30 commit 08a06b83ff8b (``blkftrace: binary tracing, synthesizing old format'') v2.6.31 commit f3948f8857ef (``blktrace: fix context-info when mixed-using blk tracer and trace events'') -- Mit freundlichen Grüßen / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [RESEND PATCH v1 2/2] trace: events: block: Add tag in block trace events
On 04/16/2018 04:33 PM, Bean Huo (beanhuo) wrote: Print the request tag along with other information in block trace events when tracing request , and unplug type (Sync / Async). Signed-off-by: Bean Huo--- include/trace/events/block.h | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/include/trace/events/block.h b/include/trace/events/block.h index 81b43f5..f8c0b9e 100644 --- a/include/trace/events/block.h +++ b/include/trace/events/block.h @@ -478,15 +486,18 @@ DECLARE_EVENT_CLASS(block_unplug, TP_STRUCT__entry( __field( int, nr_rq ) + __field( bool, explicit) __array( char, comm, TASK_COMM_LEN ) ), TP_fast_assign( __entry->nr_rq = depth; + __entry->explicit = explicit; memcpy(__entry->comm, current->comm, TASK_COMM_LEN); ), - TP_printk("[%s] %d", __entry->comm, __entry->nr_rq) + TP_printk("[%s] %d %s", __entry->comm, __entry->nr_rq, + __entry->explicit ? "Sync" : "Async") ); /** This entire hunk does not seem related to this patch description. Also, I'm not sure trace-cmd and perf et al. could format it accordingly. See also my patch for this same functionality: https://www.spinics.net/lists/linux-block/msg24691.html ("[PATCH v2 1/2] tracing/events: block: track and print if unplug was explicit or schedule") -- Mit freundlichen Grüßen / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status
On 04/19/2018 10:18 PM, Omar Sandoval wrote: > On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote: >> On 4/19/18 1:41 PM, Bart Van Assche wrote: >>> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote: On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote: > Thanks for the test! Applied. Side note, it's unfortunate that this test takes 180 seconds to run only because we have to wait for the command timeout. We should be able to export request_queue->rq_timeout writeable in sysfs. Would you be interested in doing that? >>> >>> Hello Omar, >>> >>> Is this perhaps what you are looking for? >>> # ls -l /sys/class/scsi_device/*/*/timeout >>> -rw-r--r-- 1 root root 4096 Apr 19 08:52 >>> /sys/class/scsi_device/2:0:0:0/device/timeout >>> -rw-r--r-- 1 root root 4096 Apr 19 12:39 >>> /sys/class/scsi_device/8:0:0:1/device/timeout >> >> We should have it generically available though, not just for SCSI. In >> retrospect, it should have been under queue/ from the start, now we'll >> end up with duplicate entries for SCSI. > > For the sake of this test, I just decreased the timeout through SCSI. Great idea. > echo 5 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/timeout" However, the timeout should be sufficiently larger than scsi_debug/delay, in order not to run into the command timeout. It may be unfortunate that scsi_debug/delay uses jiffies as unit and can thus differ in a range of an order of magnitude for different kernel configs. > # delay to reduce response repetition: around 1..10sec depending on HZ > echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay On s390, we typically have HZ=100, so 1000 jiffies are 10 seconds. We can increase the sdev cmd timeout or decrease the scsi_debug/delay. 100 instead of 1000 for scsi_debug/delay worked for me; but for some reason the loop checking for busy did not work (any more?) causing an unexpected test case error: > # ./check scsi/004 > scsi/004 (ensure repeated TASK SET FULL results in EIO on timing out command) > [failed] > runtime 31.892s ... 31.720s > --- tests/scsi/004.out2018-04-16 11:47:19.105931872 +0200 > +++ results/nodev/scsi/004.out.bad2018-04-23 14:07:33.615445253 > +0200 > @@ -1,3 +1,3 @@ > Running scsi/004 > -Input/output error > +modprobe: FATAL: Module scsi_debug is in use. > Test complete so I added another sleep hack: # dd closing SCSI disk causes implicit TUR also being delayed once +# sleep over time window where READ was done and TUR not yet queued +sleep 2 while grep -q -F "in_use_bm BUSY:" "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do What do you think? -- Mit freundlichen Grüßen / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
RE: [PATCH v2 01/14] mpt3sas: Bug fix for big endian systems.
Martin, Please see my replies inline. Thanks, Chaitra -Original Message- From: Martin K. Petersen [mailto:martin.peter...@oracle.com] Sent: Saturday, April 21, 2018 3:52 AM To: Chaitra P B Cc: linux-scsi@vger.kernel.org; sathya.prak...@broadcom.com; sreekanth.re...@broadcom.com; suganath-prabu.subram...@broadcom.com Subject: Re: [PATCH v2 01/14] mpt3sas: Bug fix for big endian systems. Chaitra, A few comments: > @@ -426,7 +427,7 @@ static void _clone_sg_entries(struct MPT3SAS_ADAPTER *ioc, > dst_addr_phys = _base_get_chain_phys(ioc, > smid, sge_chain_count); > WARN_ON(dst_addr_phys > U32_MAX); > - sgel->Address = (u32)dst_addr_phys; > + sgel->Address = cpu_to_le32((u32)dst_addr_phys); I tend to prefer lower_32_bits() but that's your choice. Accepted, lower_32_bits() can be used here. > @@ -3040,8 +3047,9 @@ mpt3sas_base_map_resources(struct MPT3SAS_ADAPTER *ioc) > } > > for (i = 0; i < ioc->combined_reply_index_count; i++) { > - ioc->replyPostRegisterIndex[i] = (resource_size_t *) > - ((u8 *)>chip->Doorbell + > + ioc->replyPostRegisterIndex[i] = > + (volatile void __iomem *) > + ((u8 __force *)>chip->Doorbell + >MPI25_SUP_REPLY_POST_HOST_INDEX_OFFSET + >(i * MPT3_SUP_REPLY_POST_HOST_INDEX_REG_OFFSET)); Do you really need volatile here? The existing resource_size_t didn't imply volatile. Why the double type casts? You've already changed replyPostRegisterIndex to be 'volatile void __iomem **' in the header file. So why not: ioc->replyPostRegisterIndex[i] = >chip->Doorbell + MPI25_SUP_REPLY_POST_HOST_INDEX_OFFSET + i * MPT3_SUP_REPLY_POST_HOST_INDEX_REG_OFFSET; Also looks like ioc->reply_post_host_index handling a few lines further down could lose the type casts. Accepted, volatile is not really needed. I shall remove volatile. > @@ -3386,7 +3394,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle) > __le32 *mfp = (__le32 *)mpt3sas_base_get_msg_frame(ioc, smid); > > _clone_sg_entries(ioc, (void *) mfp, smid); > - mpi_req_iomem = (void *)ioc->chip + > + mpi_req_iomem = (void __force *)ioc->chip + > MPI_FRAME_START_OFFSET + (smid * ioc->request_sz); > _base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp, > ioc->request_sz); Wouldn't it be better to add __iomem to the definition of mpi_req_iomem? With this change I still see the below warnings: warning: cast removes address space of expression warning: incorrect type in assignment (different address spaces) expected void [noderef] *mpi_req_iomem got void * warning: incorrect type in argument 1 (different address spaces) expected void *dst_iomem got void [noderef] *mpi_req_iome > + nvme_encap_request->ErrorResponseBaseAddress = > + cpu_to_le64(ioc->sense_dma & 0xUL); upper_32_bits()? since upper_32_bits() returns only upper 32 bits. But here after bitwise & below we are doing bitwise | with dma_address lower 32 bits , so in this case use of upper_32_bits() will yield wrong address for below assignment. Hence upper_32_bits() can't be used. nvme_encap_request->ErrorResponseBaseAddress |= cpu_to_le64(le32_to_cpu( mpt3sas_base_get_sense_buffer_dma(ioc, smid))); -- Martin K. Petersen Oracle Linux Engineering
[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete
https://bugzilla.kernel.org/show_bug.cgi?id=199435 --- Comment #14 from Anthony Hausman (anthonyhaussm...@gmail.com) --- Indeed I have a charged battery (capacitor) and the writeback-cache enabled. I run the hp-health component too, I have already try to disable it on the 4.11 kernel and have reproduced the problem of load without it. The cma related call trace up after the logical drive reset is called. Right now, I test on a server the kernel 4.16.3-041603-generic with the hpsa module with the patch to use local work-queue insead of system work-queue. Right now I didn't reproduce the problem. I had a disk with bad blocks (before launching a read-only test badblocks returned a lot of block error) but since I have upgraded the kernel with the patch hpsa module I have no more error. I'm still trying to reproduce the problem by launching a badblocks read-only test on the "ex-faulty" disk. I'll tell you the result of the test. -- You are receiving this mail because: You are the assignee for the bug.
Re: MegaCli fails to communicate with Raid-Controller
> On 23. Apr 2018, at 10:28, Kashyap Desaiwrote >> >> Just did that on the “Dell R730xd, MegaRAID SAS-3 3108” and get the >> following output when the megacli works fine. >> >> ### >> Apr 23 09:31:37 xh643 kernel: [ 368.319092] GD IOV-len: 2048 Apr 23 >> 09:31:37 >> xh643 kernel: [ 368.319426] GD IOV-len: 32 Apr 23 09:31:37 xh643 kernel: >> [ >> 368.319563] GD IOV-len: 320 Apr 23 09:31:37 xh643 kernel: [ 368.319698] >> GD >> IOV-len: 616 Apr 23 09:31:37 xh643 kernel: [ 368.319887] GD IOV-len: 1664 >> Apr 23 09:31:37 xh643 kernel: [ 368.320040] GD IOV-len: 32 Apr 23 >> 09:31:37 >> xh643 kernel: [ 368.320174] GD IOV-len: 8 … ### >> >> Full output is attached in iov_len_megacli_works.txt, it also contains the >> output of /proc/buddyinfo which might be important based in my research so >> far. > > We need similar output whenever there is a dma_alloc_coherent() failure. > Did you added new prints in failure of dma_alloc_coherent() OR it is generic > print for all the case ? Right now its generic for every iteration of the loop ### 6943 for (i = 0; i < ioc->sge_count; i++) { 6944 if (!ioc->sgl[i].iov_len) 6945 continue; 6946 6947 printk("GD IOV-len: %d\n", ioc->sgl[i].iov_len); 6948 6949 kbuff_arr[i] = dma_alloc_coherent(>pdev->dev, ### I will add the printk to dma_alloc_coherent() as well to see, which request actually fails. But i have to be a bit patient since its a production system and the customers aren’t to happy about reboots. I have tried to build live-patch kernel to be more flexible with changes, but the built failed with ### ... /root/kpatch/kpatch-build/create-diff-object: unreconcilable difference ... ### but thats out of focus for this list and just fyi :-) signature.asc Description: Message signed with OpenPGP
RE: MegaCli fails to communicate with Raid-Controller
> > Interesting. What is considered old and new? I have a third machine "Dell > R515, MegaRAID SAS 2108”, is that considered new? Its running the same > Xen/Kernel/Megacli-versions as the other two, but the error does not > occur. Nope this is also old controller. When I say new controller, It is pretty much active development like SAS3.0 and SAS3.5. Driver level changes related to DMA mask settings is FW dependent, so we cannot open it for all. > > > There can be a two possibilities. > > > > 1. This is actual memory allocation failure due to system resource > > issue. > > I have not seen any OOMs on the two machines when/where the SGL-error > occurs. According to "xl info” and our munin-graphs it all looks ok with a > couple 100 MiB “free". > > > > 2. IOCLT provided large memory length in iov and dma buffer allocation > > from below API failed due to large memory chunk requested. > > > >kbuff_arr[i] = dma_alloc_coherent(>pdev->dev, > >ioc->sgl[i].iov_len, > >_handle, > > GFP_KERNEL); > > > > Can you change driver code *printk* to dump iov_len ? Just to confirm. > > Just did that on the “Dell R730xd, MegaRAID SAS-3 3108” and get the > following output when the megacli works fine. > > ### > Apr 23 09:31:37 xh643 kernel: [ 368.319092] GD IOV-len: 2048 Apr 23 > 09:31:37 > xh643 kernel: [ 368.319426] GD IOV-len: 32 Apr 23 09:31:37 xh643 kernel: > [ > 368.319563] GD IOV-len: 320 Apr 23 09:31:37 xh643 kernel: [ 368.319698] > GD > IOV-len: 616 Apr 23 09:31:37 xh643 kernel: [ 368.319887] GD IOV-len: 1664 > Apr 23 09:31:37 xh643 kernel: [ 368.320040] GD IOV-len: 32 Apr 23 > 09:31:37 > xh643 kernel: [ 368.320174] GD IOV-len: 8 … ### > > Full output is attached in iov_len_megacli_works.txt, it also contains the > output of /proc/buddyinfo which might be important based in my research so > far. We need similar output whenever there is a dma_alloc_coherent() failure. Did you added new prints in failure of dma_alloc_coherent() OR it is generic print for all the case ?